Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

General thoughts

Convert page images into searchable text. Talk about software, techniques, and new developments here.
Post Reply
gmhberg
Posts: 1
Joined: 04 Mar 2014, 00:52

General thoughts

Post by gmhberg » 04 Oct 2009, 16:10

I read with interest,what you all have built and invented.
My first question : What is the minimum quality the OCR software can handle? Could we get along with a couple of webcams and they are tethered by the computer? Less cost, less weight.
If the cameras are fixed to the glass, the distance to the book is always the same and autofocus is not necessary. As the object is flat, you don't have to close the aperture to get more depth of field.
If the book is opened to more than 90deg, you might have it easier to frame, without getting the opposite page disturbing the OCR.

spamsickle
Posts: 596
Joined: 06 Jun 2009, 23:57

Re: General thoughts

Post by spamsickle » 04 Oct 2009, 16:25

I suspect webcams would provide adequate quality for OCR, if that's your goal. Certainly, with post-processing software that could clean up the image, it should work as well as scanned pages on reasonably new and clean books. Even on scanned images, OCR is going to misread a word or two per page, in my experience. On older books with foxing and yellowing, this is an even bigger problem, so for me, the end product is not usually an OCR text, but an enhanced version of the original image. I can't say whether a webcam's output would be something I'd want to read unless I saw some samples.

I have my cameras fixed to the glass, and am using fixed-focus and a wide-open aperture. I'm happy with the results. As I've implemented it, with counterweights, the scanning is not tiring, but I do have to concede that this design is still not perfect. It's more prone to registration problems -- i.e., pages can jitter from one frame to the next in the raw image. That's kind of why I wrote YAPP, to clip the content out and remove the jiggles, and Scan Tailor also seems to do a good job of dealing with those artifacts on the back end.

I'd be concerned about how to light the book so that the reflections of the lights didn't show up in the images if you opened the books more than 90 degrees. I'm sure it can be done, but it would require some thought. I don't think getting the opposite page in the original snapshot is a big problem; that part should be eliminated before you perform an OCR step anyway.

StevePoling
Posts: 290
Joined: 20 Jun 2009, 12:19
E-book readers owned: SONY PRS-505, Kindle DX
Number of books owned: 9999
Location: Grand Rapids, MI
Contact:

Re: General thoughts

Post by StevePoling » 06 Oct 2009, 00:34

What's foxing?

phaedrus
Posts: 56
Joined: 04 Mar 2014, 00:52

Re: General thoughts

Post by phaedrus » 06 Oct 2009, 15:26

Hi, it's a sort of spotting or brown spots randomly placed throughout the books, this article describes it far better than I could:

http://en.wikipedia.org/wiki/Foxing

Cheers, P.

Post Reply