Scan Tailor Advanced

Scan Tailor specific announcements, releases, workflows, tips, etc. NO FEATURE REQUESTS IN THIS FORUM, please.

Moderator: peterZ

dtic
Posts: 464
Joined: 06 Mar 2010, 18:03

Re: Scan Tailor Advanced

Post by dtic »

zbgns wrote: 26 Jun 2018, 17:45 I guess this happens in case of MRC compressed pdf files.
https://en.wikipedia.org/wiki/Mixed_raster_content
Images with inverted colors (black background and white letters) obtained by unpacking images from pdfs are masks placed on top of color backgrounds. It is the most effective way to obtain crisp color text on white background. Typically there is jp2 compressed color background and the mask with jbig2 compression. In case of such pdfs it may be better way to render an image of the whole page instead of extracting pictures from them. Otherwise you may get several pictures and each represents only a part of an input page, which are usually useless for further processing like preparation to OCR.
I don't know much about that format/method. But I think I've run into the problem with several pictures for parts of the same page.
Have you done tests comparing Acrobat against ImageMagick or other open tools for pdf to jpg image render?
There are some ImageMagick recipes here https://stackoverflow.com/questions/660 ... resolution
zbgns wrote: 26 Jun 2018, 17:45 Anyone knows any open implementation of this MRC method that may be used for non commercial projects? I guess there are only proprietary implementations so far. E.g. Abbyy FineReader is able to produce such MRC pdfs, which is big advantage of this tool.
You mean an open tool to create MRC pdf files? Don't know.
zbgns
Posts: 61
Joined: 22 Dec 2016, 06:07
E-book readers owned: Tolino, Kindle
Number of books owned: 600
Country: Poland

Re: Scan Tailor Advanced

Post by zbgns »

dtic wrote: 27 Jun 2018, 19:05
Have you done tests comparing Acrobat against ImageMagick or other open tools for pdf to jpg image render?
There are some ImageMagick recipes here https://stackoverflow.com/questions/660 ... resolution
I tried various methods in the past. My first method was pdf2tiff (it seems to be abandoned and hardly available now). In the meantime I used also Adobe Acrobat Pro and Master PDF Editor for that (commercial very expanded GUI tools). Obvious choices are ImageMagick and Ghostscript I'm also using. There are also other tools like e.g. pdftoppm, but I even do not remember whether I tested it at all. I didn't do any special comparison among them as the output quality was sufficient for my purposes in each case. The crucial thing is to set proper parameters, especially DPI. 300 DPI seems to be optimal in most cases. And it is also good idea to choose any lossless image format as an output. Tiff or png seem to be proper ones. Jpg is lossy and adds awful distortion to B&W pictures so I try to avoid it even if pictures embedded into a pdf file are in jpg format. It doesn't apply when jpg pictures are directly unpacked from pdf (with no format conversion) using pdfimages, as there should be no loss in quality.
dtic wrote: 27 Jun 2018, 19:05 You mean an open tool to create MRC pdf files? Don't know.
There was pdfbeads that did sort of that. But isn't under development for a long time.
4lex4
Posts: 29
Joined: 15 Oct 2017, 12:35
Number of books owned: 0
Country: Russia

Re: Scan Tailor Advanced

Post by 4lex4 »

A new update with new features, fixes and improvements has been released. Also if anyone wants to support the project or express thanks, please donate.
zbgns
Posts: 61
Joined: 22 Dec 2016, 06:07
E-book readers owned: Tolino, Kindle
Number of books owned: 600
Country: Poland

Re: Scan Tailor Advanced

Post by zbgns »

4lex4 wrote: 05 Jul 2018, 21:58 A new update with new features, fixes and improvements has been released. Also if anyone wants to support the project or express thanks, please donate.
I would like to express my thanks :D
I really appreciate development of this great software and the new features I had chance to see earlier checking the developer branch at Github. I would like to take this opportunity and also thank Tulon for starting this great project and earlier versions of Scan Tailor, especially revealing Scan Tailor Experimental I used with pleasure for couple of years.
L.Willms
Posts: 134
Joined: 21 Sep 2016, 10:51
E-book readers owned: Tolino Shine
Country: Germany
Location: Frankfurt/Main, Germany

Re: Scan Tailor Advanced

Post by L.Willms »

4lex4 wrote: 05 Jul 2018, 21:58 A new update with new features, fixes and improvements has been released.
It is an installer this time. Now I wonder where this version of Scan Tailor Advanced stores its config?

The previous version were simply unpacked into a subfolder of UTIL, and kept the config in a subdirectory CONFIG.
4lex4 wrote: 05 Jul 2018, 21:58 Also if anyone wants to support the project or express thanks, please donate.
Tell me your bank account per private message...
L.Willms
Posts: 134
Joined: 21 Sep 2016, 10:51
E-book readers owned: Tolino Shine
Country: Germany
Location: Frankfurt/Main, Germany

Re: Scan Tailor Advanced

Post by L.Willms »

L.Willms wrote: 15 Jul 2018, 07:22 I wonder where this version of Scan Tailor Advanced stores its config?

The previous version were simply unpacked into a subfolder of UTIL, and kept the config in a subdirectory CONFIG.
The "scantailor.ini" of 10.0.14 is located not only in the "config\scantailor" subdirectory of the program directory (c:\util\scantailor-advanced\) but also in "c:\program data\Scantailor-advanced\config\scantailor" and also in "C:\Users\All Users\Scantailor-advanced\config\scantailor" -- all three are identical.

But 10.0.15 or Scantailor Advanced sees none of them, and I can't (yet) find where this version stores its config. Please clarify!
L.Willms
Posts: 134
Joined: 21 Sep 2016, 10:51
E-book readers owned: Tolino Shine
Country: Germany
Location: Frankfurt/Main, Germany

Re: Scan Tailor Advanced

Post by L.Willms »

L.Willms wrote: 16 Jul 2018, 02:54
L.Willms wrote: 15 Jul 2018, 07:22 I wonder where this version of Scan Tailor Advanced stores its config?

The previous version were simply unpacked into a subfolder of UTIL, and kept the config in a subdirectory CONFIG.
The "scantailor.ini" of 10.0.14 is located not only in the "config\scantailor" subdirectory of the program directory (c:\util\scantailor-advanced\) but also in "c:\program data\Scantailor-advanced\config\scantailor" and also in "C:\Users\All Users\Scantailor-advanced\config\scantailor" -- all three are identical.

But 10.0.15 or Scantailor Advanced sees none of them, and I can't (yet) find where this version stores its config. Please clarify!
I finally found it. The 10.4.15 keeps only one copy of the INI file, but it has been renamed from "scantailor" to "scantailor-advanced". It is now at

C:\Users\%userid%\AppData\Roaming\scantailor-advanced\scantailor-advanced.ini
(when C is your boot drive)

I now wonder how compatible or incompatible the contents of the INI-File are.
L.Willms
Posts: 134
Joined: 21 Sep 2016, 10:51
E-book readers owned: Tolino Shine
Country: Germany
Location: Frankfurt/Main, Germany

Re: Scan Tailor Advanced

Post by L.Willms »

4lex4 wrote: 05 Jul 2018, 21:58 A new update with new features, fixes and improvements has been released.
And from one mini-minor release to the next (14 to 15), one can't use the little update to continue to work on current projects: "The project file is incompatible with the current application version".

Please add a feature that each new version can convert the config of previous projects!
4lex4
Posts: 29
Joined: 15 Oct 2017, 12:35
Number of books owned: 0
Country: Russia

Re: Scan Tailor Advanced

Post by 4lex4 »

Updated.

L.Willms,
Read the readme.
You can just unpack the installer or install with that into a custom non-system path, then STA will store the config inside the app dir.
Tell me your bank account per private message...
Use the donate link below: you needn't a paypal account, just click the continue link at the page bottom.
And from one mini-minor release to the next (14 to 15), one can't use the little update to continue to work on current projects
Changes in 15 weren't really a minor update or a path to 14, as it's changed the implementation of some existing and added new features making the new schema incompatible with the old one in lots of different places.
L.Willms
Posts: 134
Joined: 21 Sep 2016, 10:51
E-book readers owned: Tolino Shine
Country: Germany
Location: Frankfurt/Main, Germany

Re: Scan Tailor Advanced

Post by L.Willms »

I really appreciated your work and efforts, and want to express my thanks to you.
4lex4 wrote: 16 Jul 2018, 23:05 Use the donate link below: you needn't a paypal account, just click the continue link at the page bottom.
I don't touch Paypal even with a 10 foot pole.
4lex4 wrote: 16 Jul 2018, 23:05
And from one mini-minor release to the next (14 to 15), one can't use the little update to continue to work on current projects
Changes in 15 weren't really a minor update or a path to 14, as it's changed the implementation of some existing and added new features making the new schema incompatible with the old one in lots of different places.
Third level changes in the version number normally indicate only bug fixes, second level changes, feature changes, and first leve major changes in the software base. That is good practice in the computer industry since its inception.

Anyway, the evolution of a software should enable the user to go with the evolution, and not nail him to older levels because of the impossibility to move projects to the later software.

New feature levels should provide means to upgrade existing work to the new features. It should not be to difficult to provide this with the new level.
Post Reply