Scan Tailor Advanced

Scan Tailor specific announcements, releases, workflows, tips, etc. NO FEATURE REQUESTS IN THIS FORUM, please.

Moderator: peterZ

Post Reply
4lex4
Posts: 29
Joined: 15 Oct 2017, 12:35
Number of books owned: 0
Country: Russia

Re: Scan Tailor Advanced

Post by 4lex4 »

0kelvin wrote: 01 May 2018, 10:50 Noticed that blank pages causes scantailor to crash when exporting pages, if there is no content box.
What the version in the about dialog? You seem to use an old version.
User avatar
Shyamasundara
Posts: 15
Joined: 29 May 2012, 09:43
E-book readers owned: I use PDF on my Mac
Number of books owned: 3500
Country: India
Location: Bangalore, India

Re: Scan Tailor Advanced

Post by Shyamasundara »

bokane wrote: 22 Feb 2018, 01:12 Wonderful to see more updates to Scan Tailor -- thanks for your hard work!

Has anybody managed to get this to compile on OSX? I'm running into errors when I try to build it; these would probably be fixable if I spent an afternoon Googling, but I'd be extremely grateful if someone could just post an executable.
Have you been able to get a working version for High Sierra? The old version I had that worked on my old Mac with Mavericks doesn't work on the latest macOS. It would really help if a working Mac version could be made available because I have never tried compiling from original code and it looks like it would be too easy to mess things up and spend a lot of time running circles when you have no experience in such matters.
gautxori
Posts: 1
Joined: 22 May 2018, 19:52
Number of books owned: 9996
Country: Spain

Re: Scan Tailor Advanced

Post by gautxori »

Could we see a feature like this implemented in "Advanced"?
delta margins.png
delta margins.png (4.95 KiB) Viewed 11139 times
What i'm trying to suggest is to add numerical fields to set margins incrementally from current ones. So that only ticked increments would be modified (ticks are on the right to the "delta" column and they affect only the field they follow. I dimmed all the numerical fields but the ticked one meaning all but this are disabled, and would not be modified. As soon as a delta field would be ticked, remaining numerical fields should be disabled).

That would let us change the margin for several pages with the same incremental value, provided the pages were selected.

I'd also like to see a similar feature for the content boxes.

Also, it would be very helpful if in the .Scantailor project file all selected pages would be identified, with a selected="selected" attribute (currently, no matter which pages are selected, that attribute is only set for the currently displayed page(!).
Scanallthebooks
Posts: 38
Joined: 01 Dec 2016, 19:05
Number of books owned: 0
Country: Denmark

Re: Scan Tailor Advanced

Post by Scanallthebooks »

gautxori wrote: 22 May 2018, 21:38 Could we see a feature like this implemented in "Advanced"?

What i'm trying to suggest is to add numerical fields to set margins incrementally from current ones. So that only ticked increments would be modified (ticks are on the right to the "delta" column and they affect only the field they follow. I dimmed all the numerical fields but the ticked one meaning all but this are disabled, and would not be modified. As soon as a delta field would be ticked, remaining numerical fields should be disabled).

That would let us change the margin for several pages with the same incremental value, provided the pages were selected.

I'd also like to see a similar feature for the content boxes.

Also, it would be very helpful if in the .Scantailor project file all selected pages would be identified, with a selected="selected" attribute (currently, no matter which pages are selected, that attribute is only set for the currently displayed page(!).
I second these suggestions, especially the incremental margin change for multiple pages.
Sallen112
Posts: 3
Joined: 02 Jun 2018, 21:34
Number of books owned: 0
Country: United States

Re: Scan Tailor Advanced

Post by Sallen112 »

Dear Scantailor Developers

I am an active user who just started using the Scantailor Advanced recently and for about 3 years I have been using the original Scantailor on the main website that we all download. I was unaware that their have been development all this time with improving the original into multiple versions, with now Scantailor Advanced coming out last year. I recently just downloaded it and I must say compared to the original, this is by far the best version of this program yet! I do have a few suggested improvements I would like to see to the program (sorry I don't know where to submit new feature suggestions):

1. The ability to input and extract images from a PDF into TIFF files before getting into the stage process, with the ability to choose DPI that you want to extract from the PDF images from. This way I wouldn't need a 3rd party PDF to TIFF converter to get the images to input into the program.

2. After stage 6 (maybe have a new stage 7), is to have the ability to combine all images in the project into a single new PDF (or seperate PDF files) to be stored in the Out folder. This way I don't need to have a PDF editor to combine the files, Scantailor should really do this now since their has been alot of development since the original one. Or incorperate this feature into step 6 somehow?

3. In Step 5, I would like to see the inclusion of a Page size option (similar to what PDF editors use), in which their should be two fields to choose a length and width of the entire page, along with expanding out the margins which you have with the Top, Bottom, Left and Right measurements. So what you could choose from this in a drop down menu would be common page sizes like: Letter, A4, A3, Legal, etc. and more. O tor be able to set you own custom page size in two fields with the length and width. The page size measurements should be able to select on the Metric or English scale (Centimeters, Millimeters, Inches and Pixels) as you can toggle them in Tools ---> Units.

4. In Stage 4, I would like to see the inclusion similar to Step's 6 Picture shape, the option to select a higher search sensitivity in the content box section and a percentage sensitivity field with up and down arrows, which this could solve the issues of artifacts or smudges outside the textbox with color or black and white images (I know your not suppose to input black and white images but with this, this could really solve the issue I think with this step), this way it should be able to select the text on the page or images in the page alot better than before.

5. Would like to see an installer .exe like the other versions of Scantailor that they use, like Scantailor Universal.

Let me know if this would be possible to implement into a future version of Scantailor. If the above can be done, I think then the program would be perfect then!
L.Willms
Posts: 134
Joined: 21 Sep 2016, 10:51
E-book readers owned: Tolino Shine
Country: Germany
Location: Frankfurt/Main, Germany

Re: Scan Tailor Advanced

Post by L.Willms »

Sallen112 wrote: 06 Jun 2018, 10:22 feature suggestions):

1. The ability to input and extract images from a PDF into TIFF files before getting into the stage process, with the ability to choose DPI that you want to extract from the PDF images from. This way I wouldn't need a 3rd party PDF to TIFF converter to get the images to input into the program.

2. After stage 6 (maybe have a new stage 7), is to have the ability to combine all images in the project into a single new PDF (or seperate PDF files) to be stored in the Out folder. This way I don't need to have a PDF editor to combine the files, Scantailor should really do this now since their has been alot of development since the original one. Or incorperate this feature into step 6 somehow?
Against.

Let the image processing software process images and the PDF processing software processing PDF files.

BTW, the whole of CS2 is (or at least was some years ago) downloadable from Adobe's website with all necessary product keys in a publicly acessible table. This included Acrobat Pro 8, which in its latest update as 8.3 is working here on my PC.

Besides, not in every stage is the next step after Scan Tailor the combination of those images into a PDF, but an OCR step after that, and that is also another software specialized for that task.

I have some misgivings with Scan Tailor in steps 4 (Select Content) and 5 (Margins), which I want to raise in a topic by itself.
antwoorden
Posts: 5
Joined: 03 May 2018, 07:43
Number of books owned: 0
Country: The Netherlands

Re: Scan Tailor Advanced

Post by antwoorden »

4lex4, you've done great work with ST Advanced, and it's nice to see you're planning to implement the features of ST Experimental! But there's one functionality I'd advise to exclude: moving the dewarping from the final to the third stage. At least there should be an ability to keep it in stage 3. The reason for this, is that the dewarping is often crashing, with many of my scanned books. When this happens in the 3rd stage, it crashes my whole project. So for me currently, the dewarping in the output stage is the only big advantage of ST Advanced in relation to ST Experimental. The other features of ST Experimental are awesome!
Sallen112
Posts: 3
Joined: 02 Jun 2018, 21:34
Number of books owned: 0
Country: United States

Re: Scan Tailor Advanced

Post by Sallen112 »

L.Willms wrote: 07 Jun 2018, 08:21
Sallen112 wrote: 06 Jun 2018, 10:22 feature suggestions):

1. The ability to input and extract images from a PDF into TIFF files before getting into the stage process, with the ability to choose DPI that you want to extract from the PDF images from. This way I wouldn't need a 3rd party PDF to TIFF converter to get the images to input into the program.

2. After stage 6 (maybe have a new stage 7), is to have the ability to combine all images in the project into a single new PDF (or seperate PDF files) to be stored in the Out folder. This way I don't need to have a PDF editor to combine the files, Scantailor should really do this now since their has been alot of development since the original one. Or incorperate this feature into step 6 somehow?
Against.

Let the image processing software process images and the PDF processing software processing PDF files.

BTW, the whole of CS2 is (or at least was some years ago) downloadable from Adobe's website with all necessary product keys in a publicly acessible table. This included Acrobat Pro 8, which in its latest update as 8.3 is working here on my PC.

Besides, not in every stage is the next step after Scan Tailor the combination of those images into a PDF, but an OCR step after that, and that is also another software specialized for that task.

I have some misgivings with Scan Tailor in steps 4 (Select Content) and 5 (Margins), which I want to raise in a topic by itself.
I don't really understand why you think Scantailor should not have a PDF to image extractor (and the other way around), it would save on having to find other software online whereas Scantailor's main purpose is to prepare text images into a PDF, why not include the ability now to do this sort of thing? First of all it would greatly speed up the post processing process anyway if this could be included. Basically what I am imagining here is a seperate function within the program, don't know why that would be a bad thing. So what if the program starts to get bigger in size with this included function, that is why this is the advanced version of the original.
antwoorden
Posts: 5
Joined: 03 May 2018, 07:43
Number of books owned: 0
Country: The Netherlands

Re: Scan Tailor Advanced

Post by antwoorden »

Another option could possibly be to save the tif files as one multipage tif, so it can be immediately fed to eg. Tesseract.
L.Willms
Posts: 134
Joined: 21 Sep 2016, 10:51
E-book readers owned: Tolino Shine
Country: Germany
Location: Frankfurt/Main, Germany

Re: Scan Tailor Advanced

Post by L.Willms »

Sallen112 wrote: 16 Jun 2018, 00:19 Scantailor's main purpose is to prepare text images into a PDF,
No, that is not Scan Tailor's main purpose.

Its purpose is the production of clear and crisp images ready for OCR or other processes from original scans or photographies.

It is good practice not to lump together a whole lot of unrelated tasks into one program, but to develop programs for one special task, and then to chain them.

I like Scan Tailor because its image processing is in many aspects better than the image processing in ABBYY Fine Reader.

ABBYY FineReader is in my work flow normally the next step which uses the output from Scan Tailor as input to produce either PDFs with text behind the image ("exact copy") or free flowing text to produce -- in again another next step -- either HTML files for publication on the Web or EPUB files for electronic books and ebook readers. In some cases, when the text is set in Fraktur (Gothic letters), which is not recognised by the simple ABBYY FineReader (but with their Recognition Server), I combine those images directly into a PDF, using Acrobat.

As to extracting images from a PDF file, I recommend the freely available Adobe CS2 (with Acrobat 8 Pro) of which the product keys are freely acessible by Adobe on the Web, here via Archive Org. You would have to incrementally install fixes to Acrobat 8 in order to finally get Acrobat 8.3. The full update in one step is no longer available since Acrobat 8 is out of maintenance. See also (in German) http://www.chip.de/downloads/Adobe-CS2- ... 62988.html As a caveat has to be noted that the Product Keys published by Adobe are meant to be used only by people who had purchased CS2 products before and need the key for reinstalling the software on, say, a new hardware.
Post Reply