How to get the same good result for this picture using Scan Tailor

Scan Tailor specific announcements, releases, workflows, tips, etc. NO FEATURE REQUESTS IN THIS FORUM, please.

Moderator: peterZ

fonoblax
Posts: 8
Joined: 31 Jan 2017, 09:06
Number of books owned: 0
Country: Morocco

How to get the same good result for this picture using Scan Tailor

Post by fonoblax »

Hi
I'm new to the forum, before i become member i heard a lot about Diy Book Scanner and Scan Tailor. I want to scan some books and know the techniques to create a perfect e-book, for the moment i'm just interested in post processing so i browsed the forum and have found this topic, what interest me in that topic is how he created a perfect image from the original sample page he scanned by using Scan Tailor, i have used the same original picture from the topic but the result output in ScanTailor was very bad even if i change parameters and values i realy can't reproduce the same clear good quality black and white page as he did. I want you to help me find out the secret behind this.

This is image he worked with:
Image

And his final result is this picture:
Image

Now i will give you my own attempts in Scan Tailor, i used a lot of parameters like making the text bolder, removed noise, corrected dpi from minimum (150) to maximum (900) but still i wasn't able to reach a clean pure good quality text like he did.

This one was too bold, the letters like "s" are damaged:
Image

In this one i made the text lighter and cleared noise:
Image

Even doing this the quality is still bad,i have other output pictures but those i posted here are the best ones.

I hope i was clear in describing my situation i hope you tell me what thing is missing. thanks
duerig
Posts: 388
Joined: 01 Jun 2014, 17:04
Number of books owned: 1000
Country: United States of America

Re: How to get the same good result for this picture using Scan Tailor

Post by duerig »

Scan Tailor uses a global threshold for binarizing images (making them black and white). This tends to interact badly with pages that have uneven lighting (like the picture you link to). In order to account for this, there is an option in ST to appy a filter to even out the lighting before binarization. Try enabling that option. It doesn't work all the time. But maybe it will in this case.

Also, make sure you are using ST experimental which is the most recently maintained version.

Overall, I don't find binarization to be worth the trade-off for myself. I have spent way too much time trying to pick apart artifacts and fix them and would probably have been happier if I'd just done grayscale (or color) from the start instead. :)

-Jonathon Duerig
duerig
Posts: 388
Joined: 01 Jun 2014, 17:04
Number of books owned: 1000
Country: United States of America

Re: How to get the same good result for this picture using Scan Tailor

Post by duerig »

I'm not sure exactly what heuristic Scan Tailor uses to even out light. But I just heard about a way to do it with ImageMagick and the result on that image is striking.

Here is the link:

http://www.imagemagick.org/Usage/compose/#divide

And here is the result of running it on that image:
test-divided.jpg
This is without using Scan Tailor at all. I think that when it comes time for Scan Tailor to binarize this new image, it will have a much easier time. And this may be the same method that Scan Tailor uses to even out light internally if you check the option.

-D
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: How to get the same good result for this picture using Scan Tailor

Post by Tulon »

fonoblax, the reason you can't match the quality achieved in another thread is that the input image uploaded to the forum is a downscaled version of the input image used by ChrisG in the original thread. How do I know that? Because the version on the forum is tiny, just 1023x682 pixels and less than 100 DPI. You just can't achieve decent quality with such input material.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
fonoblax
Posts: 8
Joined: 31 Jan 2017, 09:06
Number of books owned: 0
Country: Morocco

Re: How to get the same good result for this picture using Scan Tailor

Post by fonoblax »

Thank you for the answer duerig!
I use version 0.9.11.1 of Scan Tailor and i didn't find an option to correct light, can you give me the link of the experimental version? I think the problem with that image is like said Tulon it was resized when first uploaded to internet, so this isn't the same image source that ChrisG worked with, here we work with image that has less information and that's why Scan Tailor generated a bad one.

What you did with ImageMagick seems great, i will consider that like an alternative to Scan Tailor, running those commands will binarize and correct light and generate a good quality image from a small resolution image that will help to OCR. But Scan Tailor apparently can't work with the new image treated with imagemagick because he damages the text.

Here's the commands to work with in ImageMagick that i found in your link if someone want to use it:

Code: Select all

convert text_scan.png -normalize text_scan_norm.png

Code: Select all

convert text_scan.png \( +clone -blur 0x20 \) \
          -compose Divide_Src -composite  text_scan_divide.png
fonoblax
Posts: 8
Joined: 31 Jan 2017, 09:06
Number of books owned: 0
Country: Morocco

Re: How to get the same good result for this picture using Scan Tailor

Post by fonoblax »

You are right, i used the same website (tinypic) to host my images here too and they were automatically resized by that website, the last two images in my first post had a resolution of 2589*3752 but when uploaded they were resized to 1104 *1600 i think the forum doesn't modify the images.
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: How to get the same good result for this picture using Scan Tailor

Post by Tulon »

Scan Tailor Experimental releases are here: https://github.com/Tulon/scantailor/releases
It's not going to help with lighting problems though.

When output mode is B/W or Mixed, illumination equalization is always on, and there is no option to disable it. In Color / Grayscale output mode, there is a checkbox to enable or disable it.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
fonoblax
Posts: 8
Joined: 31 Jan 2017, 09:06
Number of books owned: 0
Country: Morocco

Re: How to get the same good result for this picture using Scan Tailor

Post by fonoblax »

Thank you Tulon for the response !
I have read in another post that you are the developer of Scan Tailor, i worked a lot with your software and it helped me especially because when i scan some chapters of a book i hold the camera by hand, i can't tell you how tiring it is, it's very exhausting task to hold the camera and to keep the book flat using the other hand :D the images i scan have all different directions and depth but in the end Scan Tailor repair everything and generate images that can be processed by OCR. You have to be proud by your very professional work and i hope others will be inspired to make software like this ;)
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: How to get the same good result for this picture using Scan Tailor

Post by Tulon »

Thanks, fonoblax.

BTW, ST Experimental does a better job at dewarping, so you may want to give it a try after all.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
fonoblax
Posts: 8
Joined: 31 Jan 2017, 09:06
Number of books owned: 0
Country: Morocco

Re: How to get the same good result for this picture using Scan Tailor

Post by fonoblax »

Can i install the experimental version in another directory so that i won't lose the classic version or there will be conflict between them ? i will give it a try i need to use this feature (dewarping) to flatten curved pictures. I have also heard of the existence of a command line version, is there a benefit of using it ?

I have another question but not related to the topic subject, it's about creating epub from text generated from ocr, do i need to keep every page as a text file and import all the text file pages to a software like Calibre so that the epub pagination will respect the original pagination of the scanned book ? or i will be forced to merge all the text altogether and work with it in Calibre ?

I have searched this in google but i don't have the perfect english terms to find a tutorial talking about this.
Post Reply