Are my pics good enough for OCR --> searchable pdf

Convert page images into searchable text. Talk about software, techniques, and new developments here.

Moderator: peterZ

Post Reply
cgott4242
Posts: 17
Joined: 26 Jun 2012, 09:36
Number of books owned: 100
Country: USA

Are my pics good enough for OCR --> searchable pdf

Post by cgott4242 »

I'm trying to convert a library of Hebrew books to searchable pdf's.
I've attached a couple of photos taken with a Canon A2200 14MP camera
and then the files after scan tailor.
I've OCR 'd it in Abbyy Finereader and the results were OK, but not yet good enough

Is my "problem" on the picture side - i.e. do I need a better camera?
or is it something else?

pics attached
Attachments
pic2 after scan tailor
pic2 after scan tailor
pic2 orig from camera
pic2 orig from camera
IMG_0395_vg.tif
pic1 after scan tailor
(760.41 KiB) Not downloaded yet
pic1_orig from camera
pic1_orig from camera
User avatar
daniel_reetz
Posts: 2812
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Are my pics good enough for OCR --> searchable pdf

Post by daniel_reetz »

I don't have much/any experience with Hebrew OCR - but it looks like you have a challenging set of books here because there is a lot of small content in toward the gutter. The edges of the lens always resolve the least and have the most aberration.

Your before/after pics look fairly typical of DIY Book Scanners using compact cameras, but the contrast seems particularly low. You could try increasing your shutter time from 1/125s to maybe 1/80s to increase the total exposure (the paper is showing up gray due to the camera's metering, you can adjust your shutter by using the +/- exposure value comp or via controlling the shutter speed manually). This will help overcome noise and put more pixels into the "right of the histogram", which improves overall image quality.

But it may be that for the fine text on the page, you simply don't have enough pixels.

OCR is very difficult in general. You should expect some level of error no matter what you do. But I trust that what you are seeing is indeed excessive.
cgott4242
Posts: 17
Joined: 26 Jun 2012, 09:36
Number of books owned: 100
Country: USA

Re: Are my pics good enough for OCR --> searchable pdf

Post by cgott4242 »

thanks for the input.
Yep -it is excessive, I also tried via a S95 camera and got much better results (unfortunately the camera wasn't mine, so I can't use in my scanner)
User avatar
daniel_reetz
Posts: 2812
Joined: 03 Jun 2009, 13:56
E-book readers owned: Used to have a PRS-500
Number of books owned: 600
Country: United States
Contact:

Re: Are my pics good enough for OCR --> searchable pdf

Post by daniel_reetz »

The S95 is only 10mp - which suggests to me you could get more out of your current setup with better settings.
Post Reply