most fuss-free method for flattening book.

Built a scanner? Started to build a scanner? Record your progress here. Doesn't need to be a whole scanner - triggers and other parts are fine. Commercial scanners are fine too.

Moderator: peterZ

Post Reply
dylansmith
Posts: 9
Joined: 05 Dec 2011, 04:24
E-book readers owned: kindle
Number of books owned: 0

most fuss-free method for flattening book.

Post by dylansmith »

Hi everyone!

I am new to this forum and have seen many ingenious solutions to digitizing books. i am also looking forward to digitizing my collection - but have a few quirks :

- i'm not intending to build my own frames/platens - is there any place where i can just buy a DIY-ed structure? or even make my own, but a REALLY simple one.( see below for details)

- intending to use only ONE camera, taking 2 pages at a time.. so i need something that stretches the book out flat horizontally.

- not looking for max archival quality, but rather just something that turns out legible and highlightable/copy-able to normal text..

- what are the steps involved to make the final output "more responsive when highlighting/flipping" in PDF? i have a few scanned books and all of them perform differently - some lag really bad. the filesizes are mostly in the 40-60mb range for a 500page book.

- comparing acrobat clearscan and ABBYY, which OCR is better? i've tried both and both aren't as good - clearscan results in good highlighting results (i.e. i can highlight most of the text, but the copied text turns out garbled), whereas ABBYY provides higher accuracy. i understand most of u guys use ScanTailor, but just want to get an opinion which would u guys prefer.
dtic
Posts: 464
Joined: 06 Mar 2010, 18:03

Re: most fuss-free method for flattening book.

Post by dtic »

Hi dylan,
If you have och can get a camera that can be triggered by some sort of remote (like a foot switch or an button remote) then a very simple method is to lay the book flat on the table, position the camera above it on a tripod. If the tripod legs are in the way then lean it and secure it to a wall or something with some string. Then press the bookspread flat with your fingers at the right and left edge of the book and take the shot.

An alternative method which takes a bit longer but get flatter shots is to press down the page spread with a piece of glass or clear plastic. Glass from a picture frame might have the right dimensions.


Once you get the hang of the above method you can work quickly and get surprisingly good images that way.

Process the photos in http://scantailor.sourceforge.net/ before doing OCR and PDF conversion. If the books are mostly text and if the shots are flat enough to not need the (still a bit experimental) dewarping features then you can work through the ScanTailor steps pretty quickly.
User avatar
Drake Ravensmith
Posts: 70
Joined: 04 Jan 2011, 05:16
E-book readers owned: Kindle 3
Number of books owned: 0

Re: most fuss-free method for flattening book.

Post by Drake Ravensmith »

dylansmith wrote:
- not looking for max archival quality, but rather just something that turns out legible and highlightable/copy-able to normal text..

- what are the steps involved to make the final output "more responsive when highlighting/flipping" in PDF? i have a few scanned books and all of them perform differently - some lag really bad. the filesizes are mostly in the 40-60mb range for a 500page book.

- comparing acrobat clearscan and ABBYY, which OCR is better? i've tried both and both aren't as good - clearscan results in good highlighting results (i.e. i can highlight most of the text, but the copied text turns out garbled), whereas ABBYY provides higher accuracy. i understand most of u guys use ScanTailor, but just want to get an opinion which would u guys prefer.
I'm not really sure what you mean here by highlightable/copy-able text. I pre-process my jpgs with scan tailor and then feed the tiff files into Abby to OCR them. It's not a perfect program, but from what I understand most OCR programs are flawed. The free ones I tested had me begging for something professional. The font can really determine how may errors you get. Also, I must recommend against taking one photo of both pages. I 'acquired' flatbed scans of a book I legitimately own figuring I could save a little time. The pages looked fairly flat and I knew the paperback wasn't terribly thick. Abby completely failed to read the curved sections, slight as they were. With my scanner, I occasionally get errors (mostly on right side pages near the spine) no matter how straight the image appears. Often, quotations marks are missed although the Doctor Who books I scan use one quote for speech rather than two so that may contribute to the problem.

Here is the most basic, bare bones, easy as dirt scanner I know of.

http://www.instructables.com/id/Bargain ... board-Box/

The number one time waster in my setup is discovering that, for whatever reason, a page wasn't in full focus when a pic was taken. I do recommend using an av output if your camera supports it so you can get a better idea if this has happened. Since a basic bookscanner is so portable and you only want to use the one camera, you can just your living room television. I'll be watching this thread if you have any other questions.
dylansmith
Posts: 9
Joined: 05 Dec 2011, 04:24
E-book readers owned: kindle
Number of books owned: 0

Re: most fuss-free method for flattening book.

Post by dylansmith »

dtic wrote:Hi dylan,
If you have och can get a camera that can be triggered by some sort of remote (like a foot switch or an button remote) then a very simple method is to lay the book flat on the table, position the camera above it on a tripod. If the tripod legs are in the way then lean it and secure it to a wall or something with some string. Then press the bookspread flat with your fingers at the right and left edge of the book and take the shot.

An alternative method which takes a bit longer but get flatter shots is to press down the page spread with a piece of glass or clear plastic. Glass from a picture frame might have the right dimensions.


Once you get the hang of the above method you can work quickly and get surprisingly good images that way.

Process the photos in http://scantailor.sourceforge.net/ before doing OCR and PDF conversion. If the books are mostly text and if the shots are flat enough to not need the (still a bit experimental) dewarping features then you can work through the ScanTailor steps pretty quickly.
hi,

that is exactly what i did actually. i used a professional dslr (canon 5D mkII) with macro lens to capture maximum detail and made it as flat as it can be. the weird thing is that the OCR sometimes can read bent text (in the middle) and fail to predict flat text (at the extreme side) correctly.

how much more would it help if i have my RAW files converted to TIFF instead of JPEG? i'm not too concerned about the file size, but rather how much more the final PDF would lag when i flip pages or highlight pages ( i have quad core i7 + 16gb ram and my 80mb,300page pdf still lags! )

i've tried scantailor before but it doesn't seem to help to improve OCR-ing by a lot? what does scantailor have that ABBYY doesn't? am i doing something wrong?

also, i have plexiglass which i can lay on top of the book, but wouldn't that affect the readability from the OCR software?
Last edited by dylansmith on 07 Dec 2011, 01:11, edited 2 times in total.
dylansmith
Posts: 9
Joined: 05 Dec 2011, 04:24
E-book readers owned: kindle
Number of books owned: 0

Re: most fuss-free method for flattening book.

Post by dylansmith »

Drake Ravensmith wrote:
dylansmith wrote:
- not looking for max archival quality, but rather just something that turns out legible and highlightable/copy-able to normal text..

- what are the steps involved to make the final output "more responsive when highlighting/flipping" in PDF? i have a few scanned books and all of them perform differently - some lag really bad. the filesizes are mostly in the 40-60mb range for a 500page book.

- comparing acrobat clearscan and ABBYY, which OCR is better? i've tried both and both aren't as good - clearscan results in good highlighting results (i.e. i can highlight most of the text, but the copied text turns out garbled), whereas ABBYY provides higher accuracy. i understand most of u guys use ScanTailor, but just want to get an opinion which would u guys prefer.
I'm not really sure what you mean here by highlightable/copy-able text. I pre-process my jpgs with scan tailor and then feed the tiff files into Abby to OCR them. It's not a perfect program, but from what I understand most OCR programs are flawed. The free ones I tested had me begging for something professional. The font can really determine how may errors you get. Also, I must recommend against taking one photo of both pages. I 'acquired' flatbed scans of a book I legitimately own figuring I could save a little time. The pages looked fairly flat and I knew the paperback wasn't terribly thick. Abby completely failed to read the curved sections, slight as they were. With my scanner, I occasionally get errors (mostly on right side pages near the spine) no matter how straight the image appears. Often, quotations marks are missed although the Doctor Who books I scan use one quote for speech rather than two so that may contribute to the problem.

Here is the most basic, bare bones, easy as dirt scanner I know of.

http://www.instructables.com/id/Bargain ... board-Box/

The number one time waster in my setup is discovering that, for whatever reason, a page wasn't in full focus when a pic was taken. I do recommend using an av output if your camera supports it so you can get a better idea if this has happened. Since a basic bookscanner is so portable and you only want to use the one camera, you can just your living room television. I'll be watching this thread if you have any other questions.
i use manual focus for the shots so there's no problem with focusing.

is there anything specific that i must do in Scantailor before going to ABBYY for OCR-ing?

also, have u guys heard about this company - blue leaf? i saw their samples and were quite impressed by the output - they make it look like a legit ebook from the publisher itself. they also claim to have higher OCR accuracy than acrobat - what software do they really use? any idea?
User avatar
Drake Ravensmith
Posts: 70
Joined: 04 Jan 2011, 05:16
E-book readers owned: Kindle 3
Number of books owned: 0

Re: most fuss-free method for flattening book.

Post by Drake Ravensmith »

Never heard of blue leaf.

This video explains scan tailor far better than I ever could.

vimeo.com/12524529

I'm not the least bit familiar with raw files but I do know that jpegs are compressed so I wouldn't think that would help. Scan Tailor changes my jpegs into tiff files. This will turn a 750 MB (or around 250 pages with an 8mp camera) into 45 MB tiff files. The resulting PDF is 1.78mb. Which would probably solve your lag problem.

I use the manual focus on my cameras as well. I usually get at least three instances where one page or the other is out of focus. I blame aliens.
dylansmith
Posts: 9
Joined: 05 Dec 2011, 04:24
E-book readers owned: kindle
Number of books owned: 0

Re: most fuss-free method for flattening book.

Post by dylansmith »

hmm i don't understand - how can the final pdf output from TIFF files be much smaller than one made from JPEGs?
dtic
Posts: 464
Joined: 06 Mar 2010, 18:03

Re: most fuss-free method for flattening book.

Post by dtic »

ScanTailor "cleans up" the page and can turn color or grayscale images into black and white. That reduces the file size. Just download it and try it our or see this video: http://vimeo.com/12524529
Post Reply