Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

Are my pics good enough for OCR --> searchable pdf

Convert page images into searchable text. Talk about software, techniques, and new developments here.
Post Reply
cgott4242
Posts: 17
Joined: 26 Jun 2012, 09:36
Number of books owned: 100
Country: USA

Are my pics good enough for OCR --> searchable pdf

Post by cgott4242 » 06 Jul 2012, 15:58

I'm trying to convert a library of Hebrew books to searchable pdf's.
I've attached a couple of photos taken with a Canon A2200 14MP camera
and then the files after scan tailor.
I've OCR 'd it in Abbyy Finereader and the results were OK, but not yet good enough

Is my "problem" on the picture side - i.e. do I need a better camera?
or is it something else?

pics attached
Attachments
IMG_0025.tif
pic2 after scan tailor
IMG_0025.JPG
pic2 orig from camera
IMG_0395_vg.tif
pic1 after scan tailor
(760.41 KiB) Not downloaded yet
IMG_0395.JPG
pic1_orig from camera

User avatar
daniel_reetz
Posts: 2776
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Are my pics good enough for OCR --> searchable pdf

Post by daniel_reetz » 11 Jul 2012, 09:19

I don't have much/any experience with Hebrew OCR - but it looks like you have a challenging set of books here because there is a lot of small content in toward the gutter. The edges of the lens always resolve the least and have the most aberration.

Your before/after pics look fairly typical of DIY Book Scanners using compact cameras, but the contrast seems particularly low. You could try increasing your shutter time from 1/125s to maybe 1/80s to increase the total exposure (the paper is showing up gray due to the camera's metering, you can adjust your shutter by using the +/- exposure value comp or via controlling the shutter speed manually). This will help overcome noise and put more pixels into the "right of the histogram", which improves overall image quality.

But it may be that for the fine text on the page, you simply don't have enough pixels.

OCR is very difficult in general. You should expect some level of error no matter what you do. But I trust that what you are seeing is indeed excessive.

cgott4242
Posts: 17
Joined: 26 Jun 2012, 09:36
Number of books owned: 100
Country: USA

Re: Are my pics good enough for OCR --> searchable pdf

Post by cgott4242 » 12 Jul 2012, 11:20

thanks for the input.
Yep -it is excessive, I also tried via a S95 camera and got much better results (unfortunately the camera wasn't mine, so I can't use in my scanner)

User avatar
daniel_reetz
Posts: 2776
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Are my pics good enough for OCR --> searchable pdf

Post by daniel_reetz » 14 Jul 2012, 14:09

The S95 is only 10mp - which suggests to me you could get more out of your current setup with better settings.

Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests