I'm trying to convert a library of Hebrew books to searchable pdf's.
I've attached a couple of photos taken with a Canon A2200 14MP camera
and then the files after scan tailor.
I've OCR 'd it in Abbyy Finereader and the results were OK, but not yet good enough
Is my "problem" on the picture side - i.e. do I need a better camera?
or is it something else?
pics attached
Are my pics good enough for OCR --> searchable pdf
Moderator: peterZ
Are my pics good enough for OCR --> searchable pdf
- Attachments
-
- IMG_0395_vg.tif
- pic1 after scan tailor
- (760.41 KiB) Not downloaded yet
- daniel_reetz
- Posts: 2812
- Joined: 03 Jun 2009, 13:56
- E-book readers owned: Used to have a PRS-500
- Number of books owned: 600
- Country: United States
- Contact:
Re: Are my pics good enough for OCR --> searchable pdf
I don't have much/any experience with Hebrew OCR - but it looks like you have a challenging set of books here because there is a lot of small content in toward the gutter. The edges of the lens always resolve the least and have the most aberration.
Your before/after pics look fairly typical of DIY Book Scanners using compact cameras, but the contrast seems particularly low. You could try increasing your shutter time from 1/125s to maybe 1/80s to increase the total exposure (the paper is showing up gray due to the camera's metering, you can adjust your shutter by using the +/- exposure value comp or via controlling the shutter speed manually). This will help overcome noise and put more pixels into the "right of the histogram", which improves overall image quality.
But it may be that for the fine text on the page, you simply don't have enough pixels.
OCR is very difficult in general. You should expect some level of error no matter what you do. But I trust that what you are seeing is indeed excessive.
Your before/after pics look fairly typical of DIY Book Scanners using compact cameras, but the contrast seems particularly low. You could try increasing your shutter time from 1/125s to maybe 1/80s to increase the total exposure (the paper is showing up gray due to the camera's metering, you can adjust your shutter by using the +/- exposure value comp or via controlling the shutter speed manually). This will help overcome noise and put more pixels into the "right of the histogram", which improves overall image quality.
But it may be that for the fine text on the page, you simply don't have enough pixels.
OCR is very difficult in general. You should expect some level of error no matter what you do. But I trust that what you are seeing is indeed excessive.
Re: Are my pics good enough for OCR --> searchable pdf
thanks for the input.
Yep -it is excessive, I also tried via a S95 camera and got much better results (unfortunately the camera wasn't mine, so I can't use in my scanner)
Yep -it is excessive, I also tried via a S95 camera and got much better results (unfortunately the camera wasn't mine, so I can't use in my scanner)
- daniel_reetz
- Posts: 2812
- Joined: 03 Jun 2009, 13:56
- E-book readers owned: Used to have a PRS-500
- Number of books owned: 600
- Country: United States
- Contact:
Re: Are my pics good enough for OCR --> searchable pdf
The S95 is only 10mp - which suggests to me you could get more out of your current setup with better settings.