select contents problems

Scan Tailor specific announcements, releases, workflows, tips, etc. NO FEATURE REQUESTS IN THIS FORUM, please.

Moderator: peterZ

Post Reply
jinjin12
Posts: 13
Joined: 01 Nov 2010, 01:09

select contents problems

Post by jinjin12 »

i've tested a couple of page and for some pages, select content works well but for others, it only selecte a couple of paragraphs and omits alot of other information on the page. it's ok for a couple of pages cause i can do it myself but with hundreds of pages, this is a big problem. how can i solve it? is the problem my input? is it due to the lack of proper lighting on the pages?
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: select contents problems

Post by Tulon »

jinjin12 wrote:is the problem my input?
I can't tell till I've seen it. Also tell which input DPI are you specifying.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
Anonymous1

Re: select contents problems

Post by Anonymous1 »

Content selection is also a big issue for me, but only on really old books with tons of marks and smudges in the margins. I tried playing with it, and I think the biggest problem with content selection is finding the text blocks. The algorithm that's currently used finds a lot of false-positives (like dots next to the content areas), but I'm not the one to fix it.

Here's a research paper on SWT (Stroke Width Transform), which would work perfectly for detecting text quickly: http://docs.google.com/viewer?a=v&q=cac ... JZ6g&pli=1
seasalt

Re: select contents problems

Post by seasalt »

hello - I love scantailor - thankyou Tulon and others for such great work!!
I have basic questions not found answer in user guide or on forum search
1) content box - is the logic I include header and footer / page number ? or strictly "content block"?
2) if I select content block (no headers hooters), will I still get this header/footer content in my output Tiff or does it remove it?
3) if I select content block, the next option, page layout, is the correct logic, I suppose to set the outer/hard margin to exact size of my book?

thankyou for any help!
cheers
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: select contents problems

Post by Tulon »

seasalt wrote:1) content box - is the logic I include header and footer / page number ? or strictly "content block"?
Everything you want to preserve is meant to be there.
seasalt wrote:2) if I select content block (no headers hooters), will I still get this header/footer content in my output Tiff or does it remove it?
Margins are cleared in all output modes except "Color / Grayscale" with "White margins" unchecked.
seasalt wrote:3) if I select content block, the next option, page layout, is the correct logic, I suppose to set the outer/hard margin to exact size of my book?
It doesn't make sense for an e-book to have margins as large as in the original. You would be wasting screen space that way. Just choose the margins you are comfortable with.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
seasalt

Re: select contents problems

Post by seasalt »

thankyou tulon

when I am using select contents - it appears to be "assessing / rendering" 1 page at a time
is there a way ST can "assess/render" all, then I use W to validate each page before any processing executed?
Post Reply