Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

select contents problems

Scan Tailor specific announcements, releases, workflows, tips, etc. NO FEATURE REQUESTS IN THIS FORUM, please.
Post Reply
jinjin12
Posts: 13
Joined: 01 Nov 2010, 01:09

select contents problems

Post by jinjin12 » 06 Feb 2011, 15:59

i've tested a couple of page and for some pages, select content works well but for others, it only selecte a couple of paragraphs and omits alot of other information on the page. it's ok for a couple of pages cause i can do it myself but with hundreds of pages, this is a big problem. how can i solve it? is the problem my input? is it due to the lack of proper lighting on the pages?

Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: select contents problems

Post by Tulon » 06 Feb 2011, 16:07

jinjin12 wrote:is the problem my input?
I can't tell till I've seen it. Also tell which input DPI are you specifying.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.

Anonymous1

Re: select contents problems

Post by Anonymous1 » 06 Feb 2011, 16:27

Content selection is also a big issue for me, but only on really old books with tons of marks and smudges in the margins. I tried playing with it, and I think the biggest problem with content selection is finding the text blocks. The algorithm that's currently used finds a lot of false-positives (like dots next to the content areas), but I'm not the one to fix it.

Here's a research paper on SWT (Stroke Width Transform), which would work perfectly for detecting text quickly: http://docs.google.com/viewer?a=v&q=cac ... JZ6g&pli=1

seasalt

Re: select contents problems

Post by seasalt » 04 May 2011, 18:57

hello - I love scantailor - thankyou Tulon and others for such great work!!
I have basic questions not found answer in user guide or on forum search
1) content box - is the logic I include header and footer / page number ? or strictly "content block"?
2) if I select content block (no headers hooters), will I still get this header/footer content in my output Tiff or does it remove it?
3) if I select content block, the next option, page layout, is the correct logic, I suppose to set the outer/hard margin to exact size of my book?

thankyou for any help!
cheers

Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: select contents problems

Post by Tulon » 08 May 2011, 03:19

seasalt wrote:1) content box - is the logic I include header and footer / page number ? or strictly "content block"?
Everything you want to preserve is meant to be there.
seasalt wrote:2) if I select content block (no headers hooters), will I still get this header/footer content in my output Tiff or does it remove it?
Margins are cleared in all output modes except "Color / Grayscale" with "White margins" unchecked.
seasalt wrote:3) if I select content block, the next option, page layout, is the correct logic, I suppose to set the outer/hard margin to exact size of my book?
It doesn't make sense for an e-book to have margins as large as in the original. You would be wasting screen space that way. Just choose the margins you are comfortable with.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.

seasalt

Re: select contents problems

Post by seasalt » 02 Jun 2011, 17:21

thankyou tulon

when I am using select contents - it appears to be "assessing / rendering" 1 page at a time
is there a way ST can "assess/render" all, then I use W to validate each page before any processing executed?

Post Reply