Brand New: Initial Impressions & Searching for Answers

Share your software workflow. Write up your tips and tricks on how to scan, digitize, OCR, and bind ebooks.

Moderator: peterZ

Post Reply
natecbc
Posts: 4
Joined: 20 Oct 2014, 18:10
E-book readers owned: Kindle Paperwhite
Number of books owned: 4300
Country: United States

Brand New: Initial Impressions & Searching for Answers

Post by natecbc »

Hello Everyone!

I'm brand new to this world of book scanning and print content digitization. I've been trying to learn about all of the different DIY and commercial offerings out there and I have to say it is a bit overwhelming. I'll get to the point where I think that I have seen all the commercial offerings and then another product will pop up in a youtube suggestion or google search. The same has proven true of the DIY sector.

I have a few comments about my findings so far and some questions:

BEST CURRENT SCAN SET-UPS:

Commercial: Atiz Bookdrive Pro (http://pro.atiz.com/) I like that it utilizes DSLR, though I understand that this has diminishing value for normal text only books, has laser guided focus, many content size options, and auto-shoots as soon as the platten comes down. The software that is utilized seems to streamline the process pretty well. Seems significantly cheaper than many other commercial offerings BCR 100, ScanRobot 2.0, etc but the Archivist seems to achieve this same system, minus DSLR support & additional lazer focus, for 1/10th the price.

DIY Option 1: "The Archivist" (http://diybookscanner.myshopify.com/pro ... er-kit-2-0) seems to be the best DIY option in-terms of the build quality and longevity. If someone were to start a local book scanning business it would look good and professional. Alternatively an industrious person could replicate the build without needing to buy the pieces again. I know originally Daniel made a CAD file available but I don't think the Archivist's current build is available if someone wanted to go build it all on their own.

DIY Option 2: "The Easy Book Scanner" (http://www.youtube.com/watch?v=ne-h7FTMZBk) was another DIY option that seemed to be the best system for scan efficiency to build cost/time. I like the usage of native camera triggers.

DIY Option 3: "Booksorber" - The simplicity of flat shooting with a basic tripod, light, and remote trigger set-up seems REALLY quick. I'm not advocating the usage of that software but I'm still getting familiarized with some of the post-process software but their video gave me the idea for that really simple 1 camera shooting set-up.

QUESTIONS:

What are the "technical" advantages to a two camera setup over a one camera set-up shooting two pages at once? Doesn't the post processing software have the ability to process either a image of one page or split an image of two pages.

Is there an ideal focal length for DSLR camera lens? Commercial set-ups seem to be using 35-50mm lenses.

Is there an ideal file format to render images into for the construction of eReader ready files like .mobi or .awz for Kindle? Perhaps exporting to .html or .txt?

I intend on scanning primarily academic works that have footnotes that are contextually relevant to their content, does anyone have a workflow to keep footnotes contextual for eReader usage? Perhaps making the numeric reference touchable with the reference listed in a lightbox with scrolling?

Has anyone in this community explored the usage of laser scanning and 3D printing for prototyping builds and reverse engineering old ones?

The web swears by ABBYY FineReader but is there a comparable OCR thats free?

Thanks you for your time!!!

-Nate
dpc
Posts: 379
Joined: 01 Apr 2011, 18:05
Number of books owned: 0
Location: Issaquah, WA

Re: Brand New: Initial Impressions & Searching for Answers

Post by dpc »

What are the "technical" advantages to a two camera setup over a one camera set-up shooting two pages at once? Doesn't the post processing software have the ability to process either a image of one page or split an image of two pages.
Yes, this can be handled in post processing. However, flattening the book puts a strain on the book's binding and still may end up with curved inner margins, so shooting the book while it's in a 'v' shaped cradle is generally a better option. It also halves the resolution of the final image to shoot both pages at once with a single camera.

Realize that you could still shoot a book in a 'v' cradle with a single camera but you'd need to shoot all the odd pages, then flip the book around and shoot the even pages. With two cameras you'd only have to page-turn once through the book. You'd have to decide how much your time is worth compared to buying a second camera.

If it were me I'd just shoot the texts and produce a PDF from the images and not worry about OCR. You're talking about possibly a lot of post-processing to OCR an academic text with footnotes. You may want to have a look at the forums at mobileread.com and see what those guys think. They also know about a lot of the free tools available for producing ebooks.

I bought ABBYY FineReader Pro to OCR my scanned images and it didn't do an adequate job with formatting and required a LOT of hand-holding to get acceptable results. In the end I ditched the OCR and used Acrobat XI Pro to produce PDFs and that's what I read on my tablet. I'll also never buy another product from ABBYY due to their idiotic copy protection that kept me from being able to install the software over a weekend when their authentication servers were down and there was no one in their office that could help me until Monday.
natecbc
Posts: 4
Joined: 20 Oct 2014, 18:10
E-book readers owned: Kindle Paperwhite
Number of books owned: 4300
Country: United States

Re: Brand New: Initial Impressions & Searching for Answers

Post by natecbc »

dpc,

Thanks for the feedback. The halving of the resolution is an obvious disadvantage that I can't believe I didn't think of. Also the margin warping is good to be aware of. I'm guessing that in your experience post-processing doesn't always do a great job of fixing warping?

Is there a particular scan build that you use? Do you happen to know if there is anyone who has done something similar to the guys at Google; where they have used scanner sensors instead of cameras?
rkomar
Posts: 98
Joined: 12 May 2013, 16:36
E-book readers owned: PRS-505, PocketBook 902, PRS-T1, PocketBook 623, PocketBook 840
Number of books owned: 3000
Country: Canada

Re: Brand New: Initial Impressions & Searching for Answers

Post by rkomar »

Older books may have pages that are warped by moisture. With a platen, you can flatten them out reasonably well. Without a platen, you may see different shading across the page if the lighting is oblique to it, or even distorted text in extreme cases. (I'm thinking of those designs that just have a frame holding down the pages of the books)
dpc
Posts: 379
Joined: 01 Apr 2011, 18:05
Number of books owned: 0
Location: Issaquah, WA

Re: Brand New: Initial Impressions & Searching for Answers

Post by dpc »

Also the margin warping is good to be aware of. I'm guessing that in your experience post-processing doesn't always do a great job of fixing warping?
Oh it'll work OK, but it's not as good as getting things right up front. If you're going to pump things through OCR it might not make a difference.
Post Reply