Conditions for Increased Page Splitting Errors

Scan Tailor specific announcements, releases, workflows, tips, etc. NO FEATURE REQUESTS IN THIS FORUM, please.

Moderator: peterZ

univurshul
Posts: 496
Joined: 04 Mar 2014, 00:53

Conditions for Increased Page Splitting Errors

Post by univurshul »

Are there any conditions that would cause a higher percentage of page splitting errors?

What increases this accuracy?
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: Conditions for Increased Page Splitting Errors

Post by Tulon »

Many different things can cause page splitting errors. Usually, when looking at an incorrectly placed split line, it's obvious what went wrong. If not, you could always post a screenshot here.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
univurshul
Posts: 496
Joined: 04 Mar 2014, 00:53

Re: Conditions for Increased Page Splitting Errors

Post by univurshul »

Tulon wrote:Many different things can cause page splitting errors. Usually, when looking at an incorrectly placed split line, it's obvious what went wrong. If not, you could always post a screenshot here.
Yeah, the obvious split errors are an understandable non-issue here.

What I'm seeing is the book-bind split is very accurate most of the time. The outside edge split of the book is what's causing hundreds of manual adjustments. Some of these splits are obvious, many appear to have the split in a area on the page that's clearly a mystery.
1.jpg
1.jpg (132.34 KiB) Viewed 9122 times
Same issues with 'selecting content'. Hundreds of manual edits to widen the crop area to capture all the text on the image
2.jpg
2.jpg (106 KiB) Viewed 9122 times

I must have the 'perfect storm' book for errors. Some reasons why I think this book is giving me high error rates:
I can see that content selection occurs with boundaries that wrap around larger/darker text?
Possibly I have scanned the book over exposed.
The text bodies are segregated around the field of view on the page.


Otherwise this is really incredible software, and I'm assuming manual edits are normal. 600 manual edits in a 1200 page book speaks to my under/overexposed scan or possibly something else.

I am running 0.9.9.1 on Mac OSX 10.6
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: Conditions for Increased Page Splitting Errors

Post by Tulon »

Looks odd. I would need a raw input file for testing. You can PM me one. Use rapidshare or whatever if it's over the size limit.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
univurshul
Posts: 496
Joined: 04 Mar 2014, 00:53

Re: Conditions for Increased Page Splitting Errors

Post by univurshul »

I have some raw jpegs in a zip-file loaded into my public DropBox account here: http://dl.dropbox.com/u/7332578/ScanTailor%20Tests.zip

I'm not sure how many these images contain some really troublesome testing areas for 'splitting' and 'content', as I've made it through most of this book manually.

I will send you more images once I process another book with similar issues.

Regards.
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: Conditions for Increased Page Splitting Errors

Post by Tulon »

The above link doesn't work.
Also, I don't need many pages, I only really need one.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
univurshul
Posts: 496
Joined: 04 Mar 2014, 00:53

Re: Conditions for Increased Page Splitting Errors

Post by univurshul »

Here are 3 raw images (unsure if they'll exhibit the noted issues): http://dl.dropbox.com/u/7332578/Archive.zip

wait a couple minutes for it to upload. If you want more images to test, I'll re-activate the above link.

Thanks again
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: Conditions for Increased Page Splitting Errors

Post by Tulon »

These work fine here.
I suggest you to find the ones that are definitely failing and post those.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
univurshul
Posts: 496
Joined: 04 Mar 2014, 00:53

Re: Conditions for Increased Page Splitting Errors

Post by univurshul »

I'll hand select some troublesome splitting & content selection pages once I have freed some time on my machine; Scan Tailor is running 24/7 this week on a few intensive projects...

Most books are splitting normally. I can tell that with some books, the single page auto-split feature required the platen seam to be very close to the edge of the image, or the software reaches past the obvious split-line and includes imagery reflected onto the adjacent glass pane. This is a combination of reflection from my lights, etc. I think with better camera work, the inaccuracy of the splitting could be alleviated...does this sound about right?

One thing I did try was reprocessing the entire set of TIFFs again as a new project, letting the auto feature re-draw on the previously processed images. It worked for the most part, with a good hour manual adjustments to boot. And I only tried this on black and white text.--Is this a good idea when dealing with every other page incorrectly splitting? Or does processing over previously processed images destroy the quality? (I am outputting 750 dpi, so the resolution is high, I'm just curious if a reprocess does damage not matter what DPI is employed)
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: Conditions for Increased Page Splitting Errors

Post by Tulon »

univurshul wrote:Is this a good idea when dealing with every other page incorrectly splitting? Or does processing over previously processed images destroy the quality?
It will decrease the quality for sure, but depending on the quality of the original material you may or may not notice that.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
Post Reply