Auto vs Manual for "Split Pages"

Scan Tailor specific announcements, releases, workflows, tips, etc. NO FEATURE REQUESTS IN THIS FORUM, please.

Moderator: peterZ

univurshul
Posts: 496
Joined: 04 Mar 2014, 00:53

Re: Auto vs Manual for "Split Pages"

Post by univurshul »

x
Last edited by Anonymous on 15 Nov 2010, 11:15, edited 2 times in total.
spamsickle
Posts: 596
Joined: 06 Jun 2009, 23:57

Re: Auto vs Manual for "Split Pages"

Post by spamsickle »

Tulon wrote:In both samples we have the same problem: they fail Scan Tailor's expectations.
I guess that makes me and Scan Tailor even.
Tulon wrote:It's actually quite hard to tell a book's edge from its spine, so ST doesn't even try.
It isn't hard for me.

People can easily identify page splits and page boundaries without any mistakes, and for most of our DIY scans those values aren't going to vary much from page to page within a book. "Apply to" seems like an excellent solution to these types of problems for the kind of scanning I'm doing. If that was an option, Scan Tailor wouldn't have to concern itself with all the odd bits of input I might be throwing at it. I could do what I do well -- telling my spine from my elbow -- and Scan Tailor could do what it does well -- binarizing text and creating bang-up TIFFs.

I know, I know, I'm going to have to do it myself. Knowing that I'm hoping to twist your application, making it do things you wouldn't want to watch, what's the chance of getting a few minutes on the phone (or Skype?) to talk about the architecture of Scan Tailor -- these filters, background tasks, cache-driven tasks, and what not? Would my odds improve if I became a SourceForge donor? Or is Doxygen and source code my only friend?
User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: Auto vs Manual for "Split Pages"

Post by Misty »

Tulon wrote:
Misty wrote:My experience has typically been that pages manually set to have no splits process just as well as pages which were split, without incorrectly selecting anything outside the page for the content area.
By not cutting off the neighbouring page, you make the life of "Select Content" stage harder. It may or may not be able to cope with that.
All right, that makes sense. However, I was wondering if anyone had any examples of that actually occurring. In my experiences, Scan Tailor's Select Content stage rarely or never makes mistakes regardless of what the split area is, so there's more of a chance of introducing errors by using the split feature than by not using it. I'm just not sure if my experience is typical.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: Auto vs Manual for "Split Pages"

Post by Tulon »

spamsickle wrote: "Apply to" seems like an excellent solution to these types of problems for the kind of scanning I'm doing. If that was an option, Scan Tailor wouldn't have to concern itself with all the odd bits of input I might be throwing at it. I could do what I do well -- telling my spine from my elbow -- and Scan Tailor could do what it does well -- binarizing text and creating bang-up TIFFs.
Well, my answer to that should really be in the FAQ, because this and similar questions get asked a lot. The reason I don't want the ability to apply cutter geometry to multiple pages is that it can rarely help but can easily hurt. I would say most of Scan Tailor users are using it with flatbed scanners, and this feature will never help them. Now, having a button, an option or whatever that's never going to help you is a terrible idea. Also keep in mind the lack of undo functionality for Scan Tailor. Every button, every option in Scan Tailor exists because it was really hard to get the job done without it. The feature you are suggesting doesn't fit into that category IMHO.
Of course you could implement it yourself. OpenSource is all about being able to do just that.
spamsickle wrote:Knowing that I'm hoping to twist your application, making it do things you wouldn't want to watch, what's the chance of getting a few minutes on the phone (or Skype?) to talk about the architecture of Scan Tailor -- these filters, background tasks, cache-driven tasks, and what not?
I guess it's possible, but you should be asking the right questions, that is those that will actually help you achieve what you want. Knowing about background processing and cache-driven tasks won't help you. A good first question would be "Where do I start (to get that feature implemented)?". Here is my answer:
The cutter geometry is represented by the page_split::PageLayout class. Judging from the namespace, it's part of the "Split Pages" stage. Stages are also referred to as filters in Scan Tailor's source code. Each stage has a number of identically named classes (they are in different namespaces though). The ones which are of interest to you are Settings and OptionsWidget classes. The settings class stores stage-specific parameters for every page. The page_split::Settings class would store page_split::PageLayout among other things. The OptionWidget class represents stage-specific left panel on Scan Tailor's main window. Your "Apply to" button will go there.
spamsickle wrote:Would my odds improve if I became a SourceForge donor?
A donation made with the purpose of getting something in return is not a donation but an investment. In this particular case it would be a bad investment, as I am not prepared to do any favours in exchange for a donation.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
univurshul
Posts: 496
Joined: 04 Mar 2014, 00:53

Re: Auto vs Manual for "Split Pages"

Post by univurshul »

x
paulica
Posts: 5
Joined: 04 Mar 2014, 00:53

Re: Auto vs Manual for "Split Pages"

Post by paulica »

maybe i have missed this, but have you tried using a slim black duct tape right in the middle ? Some service providers using planetary book scanners use this trick to improve accuracy of deskew page splitting and cropping. I've heard some also using them to help the curvature correction. Generally they use a black stick, but for books that don't have the writting well into the binding duct tape would be interesting for a DIY book scanner.
o3h1p
Posts: 71
Joined: 08 Nov 2010, 22:47

Re: Auto vs Manual for "Split Pages"

Post by o3h1p »

Jon I'm running into a similar problem and I wanted to share my solution in case you didn't find it out. Or for posterity.

As you noticed ST doesn't do a good job finding the platen seam when it catches the reflection of the other page. This problem is particularly bad when you have 2 column texts and much worse when you have figures within those columns. The problem is so bad that 20% of my pages are incorrectly split (lots of figures I guess).

Doing a manual clip after the reflection line would solve the problem but as you noted you can't do a manual clip across all pages. But if you turn off 'split pages' by setting it to the option of no split line for all pages it selects the entire photo and effectively disables 'split pages'. Then you can use your manual selection box to select content across all pages.

I've found this method to be much faster than letting ST try and find the content and going over and correcting the many mistakes--my books are just far too complex for its algorithm I guess.

Hope this helps someone.
jack
andre_lafayette
Posts: 7
Joined: 27 Jun 2013, 05:14
E-book readers owned: Kindle
Number of books owned: 100
Country: United Kingdom

Re: Auto vs Manual for "Split Pages"

Post by andre_lafayette »

Jack,

Thank you for posting this tip for posterity! I am trying to process books with lots of figures in them. Very frequently, Scan Tailor incorrectly detects the bounding box of these figures as the edge of the page, which then affects the performance of the "Select Content" step. I was considering implementing the complicated solution of applying a custom page split to all pages by editing the project file when I found your tip. Indeed, manually setting the page layout to single pages and then applying it to all pages effectively disabled 'split pages' and solved the problem!! Many thanks!
Post Reply