Page 1 of 1

ST just for splitting pages

Posted: 05 Sep 2016, 14:49
by The Purple Parrot
Dear Pals,
Is there a way to use ScanTailor just for splitting pages? I have found that Abby Finereader 12 is very good for everything else but scan tailor seems to be superior for splitting the pages automatically.

Re: ST just for splitting pages

Posted: 06 Sep 2016, 10:03
by Tulon
ST wasn't designed to be used like that. You can still do it, though not fully automatically, which probably defeats the point in your situation. The manual part is going to be setting the content box to cover each page.

Re: ST just for splitting pages

Posted: 17 Mar 2017, 16:30
by jaffamuffin
Yes please - This is the most wanted feature, I asked about this years ago. Scan Tailor has by far, the best page splitting algorithms. And the best way to correct any errors. The only way it can work currently is to run page split, and then just set up full content, and 0 margins, and save the ST files in case you need to go back and edit.

If there was some kind of point it at a directory and split all images within setting it would be amazing.

Alternatively, can anyone point me to the algorithms used, and perhaps a standalone utility using the same processes could be developed?

Re: ST just for splitting pages

Posted: 17 Mar 2017, 18:36
by Tulon
I am no longer involved in Scan Tailor development, though I can point you to the relevant code.

The page splitting algorithm is roughly as follows:
  1. Do some pre-processing to suppress stuff other than more or less vertical lines (see filters/page_split/VertLineFinder.cpp)
  2. Find such lines in pre-processed image with Hough Transform (see imageproc/HoughLineDetector.cpp)
  3. Use heuristics to pick one splitting line or two bounding lines (see filters/page_split/PageLayoutEstimator.cpp)
If we are splitting two-page images, on step 3 we just pick the most "central" of the lines found on step 2 and use that as a splitting line.

Re: ST just for splitting pages

Posted: 05 Apr 2017, 09:40
by jaffamuffin
thank you. I will take a look into this at some point.