Scan Tailor

Scan Tailor specific announcements, releases, workflows, tips, etc. NO FEATURE REQUESTS IN THIS FORUM, please.

Moderator: peterZ

Locked
User avatar
daniel_reetz
Posts: 2812
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Scan Tailor

Post by daniel_reetz »

Tulon, did you delete that installer? I just got done installing the last one, went to install this one, except the link redirects to onlinedisk.ru's homepage.
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: Scan Tailor

Post by Tulon »

daniel_reetz wrote:Tulon, did you delete that installer? I just got done installing the last one, went to install this one, except the link redirects to onlinedisk.ru's homepage.
I didn't. Anyway, here is another link.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
SuperNibbler
Posts: 1
Joined: 04 Mar 2014, 00:52

Re: Scan Tailor

Post by SuperNibbler »

We are finding ScanTailor very interesting for our work here at the University of Florida Digital Collections:
http://ufdcweb1.uflib.ufl.edu/ufdc/?m=lhh
However, is there a way to turn off the LZW compression?
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: Scan Tailor

Post by Tulon »

SuperNibbler wrote:However, is there a way to turn off the LZW compression?
There is no such an option. What would be the point? It's a lossless compression and patents on it have expired.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
User avatar
daniel_reetz
Posts: 2812
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Scan Tailor

Post by daniel_reetz »

I've been putting this version ST through torture tests on a few different machines now. I can't get it to crash. I've tried some weird stuff -- impossibly large images (resized in other apps), screwing with custom DPI settings on output, etc.

Setting output DPI to 1200 nearly locks up my CPU, but ST still keeps working. I've tried non-book images, really ugly handheld photographs of book pages, and while sometimes the input was so bad that the output sucked, generally speaking ST did a great job.

I'm very impressed with the despeckling preview. All of the choices ST made on these example scans from my very first scanner were appropriate. I'm still having occasional problems with page split detection. From the same zip file, it failed on:
IMG_2573
IMG_2574
IMG_2756
IMG_2758
IMG_2759

All failed in the same way. I don't know if you really need to download that file to get the idea. ST split the pages consistently at the rightmost edge of the pages instead of the gutter. I realize that might not be the kind of feedback you're looking for, but there it is anyway. I'll continue beating on ST in the background as I work on other things, and see if I can't get it to fail in other, more interesting ways.

One other thing that came up was that the cover image sometimes came out well, and sometimes didn't. I'm attaching an after/before image. I suspect that this is me using ST incorrectly, but I stupidly didn't record the settings I used to break things in this fashion. Oh, end-users.
Attachments
BHT.jpg
BHT.jpg (75.25 KiB) Viewed 8144 times
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: Scan Tailor

Post by Tulon »

daniel_reetz wrote:I'm still having occasional problems with page split detection.
I took a look at those. In all cases it did find the gutter, along with some other lines, like the outer edge of a page or picture edges. The problem is it can't reliably tell them apart. I currently use some heuristics for that purpose. We need something better than that. I am thinking about applying the Fourier transform to areas beyond the leftmost and rightmost lines, to check which one of them has more high frequency components. I might try it some day, though I was going to focus on dewarping next. A contribution of such code would obviously be welcome.
daniel_reetz wrote:One other thing that came up was that the cover image sometimes came out well, and sometimes didn't.
ST's binarization doesn't really work for non-document images. I suggest outputting the cover page in "Color / Grayscale" mode. Don't enable illumination equalization there - that also doesn't work for non-document images.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
User avatar
daniel_reetz
Posts: 2812
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Scan Tailor

Post by daniel_reetz »

Tulon wrote:
daniel_reetz wrote:I'm still having occasional problems with page split detection.
I might try it some day, though I was going to focus on dewarping next. A contribution of such code would obviously be welcome.
I think Spamsickle was looking at some different ideas regarding page splitting. For what it's worth, I think dewarping is more important than a few missed page splits. I've been teaching myself C with these tutorials, but it's going to be a year or two before I'm of any use to you. :)
spamsickle
Posts: 596
Joined: 06 Jun 2009, 23:57

Re: Scan Tailor

Post by spamsickle »

I assume the "for_tulon" in your link means it re-directs other IPs, because I just get a generic home page.

My alternative method for page splitting is really simple, and depends on the kind of pages we get from our scanners. Since I can't test your examples myself, you might try running them through YAPP to see if it detects the page any better than ST.

Basically, I expect 3 "dark" edges and one not-so-dark where the gutter will be. The trapezoid at that not-so-dark edge is assumed to narrow from the edge to the gutter, where the page would be split.

The algorithm just adds the values of pixels in rows or columns. It takes a quick "crosshair" center-of-the-image measurement, finds the 3 dark and 1 other edges, and works its way in from the edges until the 3 dark edges go from dark to light and the other edge gets close enough to the center-of-image value to assume we've found the gutter. It's very fast, and reasonably accurate at finding all four edges of the page, though for ST integration it would only be finding the one edge.

Coding it is trivial; what's slowing me up is understanding the ins and outs of ST well enough to know where my code should go. I have created a new icon for the page-split filter to denote "DIY-style" scans...
StevePoling
Posts: 290
Joined: 20 Jun 2009, 12:19
E-book readers owned: SONY PRS-505, Kindle DX
Number of books owned: 9999
Location: Grand Rapids, MI
Contact:

Re: Scan Tailor

Post by StevePoling »

Tulon wrote:...I was going to focus on dewarping next...
Though an automatic dewarping algorithm we've seen described elsewhere is the cat's meow, I'd be pleased as punch with a simpler enhancement to the deskew functionality. At present, you can manually rotate the image. Please consider adding a slider-control to "tip" the image so that the right/left edge appears closer/farther than before. This would clear up any distortion due to keystoning and should be simple perspective geometry. As long as the original lines of text aren't curved, this would work perfectly. I think this operation is properly termed "deskewing" given the meaning of the word "skew."

I'd save dewarping curved text lines for a stretch goal. But you know better than I what you find interesting and how the code is internally organized (to make things easy/tedious), so I look forward
User avatar
daniel_reetz
Posts: 2812
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Scan Tailor

Post by daniel_reetz »

spamsickle wrote:I assume the "for_tulon" in your link means it re-directs other IPs, because I just get a generic home page.
Stupidity on my part -- the link is to http://danreetz.com/for_tulon/BasicHand ... p.filepart, but the file is http://danreetz.com/for_tulon/BasicHand ... kScans.zip -- I copied the link before it uploaded. If I get time tonight, I'll have a look at YAPP.
Locked