Page 1 of 1

Automation

Posted: 22 Jul 2011, 10:08
by Axel Arnold Bangert
Dear supporters,
how can ScanTailor be automated? I have programed a scanning win application for the V-Scanner, which automates
OminiPage OCR or Adobe OCR or Tsinghua TH-OCR (Word 2007 Imaging Component) OCR during each scanning process.
The best OCR automation features (full automation) and OCR quality results delivers the Abbyy OCR SDK. That
works very fine and delivers fast und good results together with my own very simple image processing routines.

Each scan is concatenated after the scan to the CompleteBook.pdf automatically via the openSource iText PDF SDK, which
is very fast and reliable (theoretically the software speed would enable 1500 p/h).
Image
The results would be much better, if I could use the image processing routines of ScanTailor. But the only way to auto-
mate ScanTailor (that I found) is to start a hidden instance programmatically and control it via the SendKey methods.
That is no good way of coding.

Do you know a better way for ScanTaylor automation or perhaps there would be the possibility to develop some com-dlls
which could be used for windows automation.

Axel Arnold Bangert - Herzogenrath 2011

Re: Automation

Posted: 22 Jul 2011, 12:02
by daniel_reetz
Welcome, Axel! Did you know there is a command-line version of Scan Tailor?

Here is a thread announcing the release of command-line Scan Tailor.
Here is a thread about building the CLI branch on Windows.

Re: Automation

Posted: 22 Jul 2011, 12:16
by Axel Arnold Bangert
daniel_reetz wrote:Welcome, Axel! Did you know there is a command-line version of Scan Tailor?

Here is a thread announcing the release of command-line Scan Tailor.
Here is a thread about building the CLI branch on Windows.
Thanks a lot - command line is fine.
Axel

Re: Automation

Posted: 22 Jul 2011, 12:29
by daniel_reetz
Please, let us know how your work with CLI version goes - I'd love to see your postprocessor in action!

Also, the scan tailor developer's mailing list is a good place to talk about issues with the CLI version. Tulon is not the main developer on the CLI branch, and the main author is not a member here.

Re: Automation

Posted: 22 Jul 2011, 14:19
by Tulon
The CLI branch was actually merged into the main branch some time ago. So, if you are on Windows, you can just download the 0.9.10rc1 and you'll find scantailor-cli.exe inside.

Re: Automation

Posted: 23 Jul 2011, 08:21
by Axel Arnold Bangert
Tulon wrote:The CLI branch was actually merged into the main branch some time ago. So, if you are on Windows, you can just download the 0.9.10rc1 and you'll find scantailor-cli.exe inside.
Dear Tulon,
thank you for the hint - I'll try.
Image

Dewarping is a very interesting theme. The LEPTONICA library liblept has a basic module for dewarping
with source ( http://tpgit.github.com/UnOfficialLeptD ... grams.html ),
where I try to learn a little bit. I made a little test with LEPTONICA. Its use is very simple
and I think that it's comparable to ScanTailor. Here is the little "quick and dirty" shot for the cleaning:
Image

I made another test with the original LEPTONICA dewarping example, which you can find here:
http://tpgit.github.com/Leptonica/dewar ... ource.html. The result of this test
is documented in a bound pdf sequence, which you can download here:
http://www.gimba.de/Leptonia-Dewarping-2.pdf . That is an excellent result. Naturally
this has to be adjusted to a variety of parameters.

By fortune I found an article "A Model-based Book Dewarping Method Using Text
Line Detection
" http://imlab.jp/cbdar2007/proceedings/papers/P1.pdf
which was published by Bin Fu, Minghui Wu, Rongfeng Li, Wenxin Li,
Zhuoqun Xu, Chunxu Yang. I guess that their approach is not SFS. The OCR software
(Tsinghua TH-OCR), which they refer to in their OCR test (94%) is part of Word 2007.
It's very interesting.
Best regards
Axel Arnold Bangert - Herzogenrath 2011