Automation

Scan Tailor specific announcements, releases, workflows, tips, etc. NO FEATURE REQUESTS IN THIS FORUM, please.

Moderator: peterZ

Post Reply
Axel Arnold Bangert

Automation

Post by Axel Arnold Bangert »

Dear supporters,
how can ScanTailor be automated? I have programed a scanning win application for the V-Scanner, which automates
OminiPage OCR or Adobe OCR or Tsinghua TH-OCR (Word 2007 Imaging Component) OCR during each scanning process.
The best OCR automation features (full automation) and OCR quality results delivers the Abbyy OCR SDK. That
works very fine and delivers fast und good results together with my own very simple image processing routines.

Each scan is concatenated after the scan to the CompleteBook.pdf automatically via the openSource iText PDF SDK, which
is very fast and reliable (theoretically the software speed would enable 1500 p/h).
Image
The results would be much better, if I could use the image processing routines of ScanTailor. But the only way to auto-
mate ScanTailor (that I found) is to start a hidden instance programmatically and control it via the SendKey methods.
That is no good way of coding.

Do you know a better way for ScanTaylor automation or perhaps there would be the possibility to develop some com-dlls
which could be used for windows automation.

Axel Arnold Bangert - Herzogenrath 2011
Last edited by Axel Arnold Bangert on 31 Jul 2011, 04:50, edited 6 times in total.
User avatar
daniel_reetz
Posts: 2812
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Automation

Post by daniel_reetz »

Welcome, Axel! Did you know there is a command-line version of Scan Tailor?

Here is a thread announcing the release of command-line Scan Tailor.
Here is a thread about building the CLI branch on Windows.
Axel Arnold Bangert

Re: Automation

Post by Axel Arnold Bangert »

daniel_reetz wrote:Welcome, Axel! Did you know there is a command-line version of Scan Tailor?

Here is a thread announcing the release of command-line Scan Tailor.
Here is a thread about building the CLI branch on Windows.
Thanks a lot - command line is fine.
Axel
User avatar
daniel_reetz
Posts: 2812
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Automation

Post by daniel_reetz »

Please, let us know how your work with CLI version goes - I'd love to see your postprocessor in action!

Also, the scan tailor developer's mailing list is a good place to talk about issues with the CLI version. Tulon is not the main developer on the CLI branch, and the main author is not a member here.
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: Automation

Post by Tulon »

The CLI branch was actually merged into the main branch some time ago. So, if you are on Windows, you can just download the 0.9.10rc1 and you'll find scantailor-cli.exe inside.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
Axel Arnold Bangert

Re: Automation

Post by Axel Arnold Bangert »

Tulon wrote:The CLI branch was actually merged into the main branch some time ago. So, if you are on Windows, you can just download the 0.9.10rc1 and you'll find scantailor-cli.exe inside.
Dear Tulon,
thank you for the hint - I'll try.
Image

Dewarping is a very interesting theme. The LEPTONICA library liblept has a basic module for dewarping
with source ( http://tpgit.github.com/UnOfficialLeptD ... grams.html ),
where I try to learn a little bit. I made a little test with LEPTONICA. Its use is very simple
and I think that it's comparable to ScanTailor. Here is the little "quick and dirty" shot for the cleaning:
Image

I made another test with the original LEPTONICA dewarping example, which you can find here:
http://tpgit.github.com/Leptonica/dewar ... ource.html. The result of this test
is documented in a bound pdf sequence, which you can download here:
http://www.gimba.de/Leptonia-Dewarping-2.pdf . That is an excellent result. Naturally
this has to be adjusted to a variety of parameters.

By fortune I found an article "A Model-based Book Dewarping Method Using Text
Line Detection
" http://imlab.jp/cbdar2007/proceedings/papers/P1.pdf
which was published by Bin Fu, Minghui Wu, Rongfeng Li, Wenxin Li,
Zhuoqun Xu, Chunxu Yang. I guess that their approach is not SFS. The OCR software
(Tsinghua TH-OCR), which they refer to in their OCR test (94%) is part of Word 2007.
It's very interesting.
Best regards
Axel Arnold Bangert - Herzogenrath 2011
Post Reply