Library Scanning - Start of new project

General discussion about software packages and releases, new software you've found, and threads by programmers and script writers.

Moderator: peterZ

cday
Posts: 451
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: Library Scanning - Start of new project

Post by cday »

No shortage of detailed advice, but looking at the images posted originally I think it would be good to back up a bit and try to address an issue with their quality:
2015_02_22_16_05_52_011__Crop.jpg
The scans are at 600DPI, which is a high resolution that has the potential to provide both excellent quality text which should convert well to black and white, and also text that should OCR very well, but the small breaks in many characters and the thinness of some lines is likely currently frustrating both.

Could those breaks possibly result from the anti-reflection coating you sprayed on the glass? Alternatively, could they result from scanner settings which aren't optimal yet?

A detail, but something probably best investigated before considering image enhancement and OCR in much detail...
fishent
Posts: 6
Joined: 26 Jan 2015, 16:23
Number of books owned: 0
Country: Thailand

Re: Library Scanning - Start of new project

Post by fishent »

Good advice. I'll check the settings on the scanner and take a better look at the output quality. You are so right. A little time at the beginning can save bundles along the way. Thanks for he good advice, this is just what I need, as I am new to this, and realise the implications of "not getting it right at the start". When i get to the OCR stage, you points will be vital.

Right now I have one person starting to scan. I need to address simple issues one by one until we get it right. I will be doing OCR with Omnipage Ultimate (18).
cday
Posts: 451
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: Library Scanning - Start of new project

Post by cday »

I forgot to mention the possibility that the breaks in the text might simply be due to the printed quality of the old texts you are scanning.
fishent wrote:I will be doing OCR with Omnipage Ultimate (18).
Nuance OmniPage and Abbyy FineReader are both excellent products, but given the potentially high quality of your 600DPI scans Adobe Acrobat's ClearScan could work well and produce both higher image quality due to its use of synthesised vector fonts, and much smaller file sizes. But it depends on your willingness to experiment and budget, and the former programs used in the usual 'text under image' hidden text layer mode do have the advantage that the original image is displayed even when the OCR process misidentifies a character or word.
BruceG
Posts: 99
Joined: 14 May 2014, 23:17
Number of books owned: 500
Country: Australia

Re: Library Scanning - Start of new project

Post by BruceG »

If you do a search for Fujitsu Scansnap will find the scanner mentioned a few times at this site. It is mentioned that the machines scans at less than 300dpi and though post processing get it up to 300 600 or 1200dpi. They also mention compression levels. The wavy lines are also mentioned.
Perhaps you could scan at different settings(dpi & compression levels - with & without glass ie with fingers/thumb) and up load them (one page is enough)as before. Use a new book that has good paper and print. It is difficult to see if problems start with the books print quality or the scanner. As the scanner scans at less than 90 degrees bad print may come out worse.
Post Reply