OCR'ing odd ball languages

Convert page images into searchable text. Talk about software, techniques, and new developments here.

Moderator: peterZ

Post Reply
gnosis

OCR'ing odd ball languages

Post by gnosis »

I have lots of texts with English mixed with odd ball languages. OCR'ing the English is no problem but does anyone know of an OCR package that can scan an arbitrary Unicode font set in a mixed environment?

Thanks
steve1066d
Posts: 296
Joined: 27 Nov 2010, 02:26
E-book readers owned: PRS-505
Number of books owned: 1250
Location: Minneapolis, MN
Contact:

Re: OCR'ing odd ball languages

Post by steve1066d »

ABBYY's FineReader is supposed to handle multiple languages pretty well.

This newsletter has some hints on working with multiple language documents:

http://www.abbyydownloads.com/images/ti ... /2011/apr/

They do offer a trial version.
Steve Devore
BookScanWizard, a flexible book post-processor.
gnosis

Re: OCR'ing odd ball languages

Post by gnosis »

Looks like Abbyreader offers a menu of accepted languages but I doubt it offers Coptic which is what I'd like to OCR. Even Microsoft does not offer it as a valid language in Windows. I was hoping there was some OCR package that you could train from scratch, when it sees character A, I tell it its an A, and so on until it learns the characters.
steve1066d
Posts: 296
Joined: 27 Nov 2010, 02:26
E-book readers owned: PRS-505
Number of books owned: 1250
Location: Minneapolis, MN
Contact:

Re: OCR'ing odd ball languages

Post by steve1066d »

While Coptic isn't built-in to FineReader, it is possible to add it. Under Tools, Options, Document, Document Languages, press "Edit languages". There's a new button, which will allow you to create an additional language based on any character set. So in your case, create a language based on Greek, add the Coptic specific symbols, then when you are ready to scan, under "Read", choose "Train User patterns" to recognize the new symbols.
Steve Devore
BookScanWizard, a flexible book post-processor.
gnosis

Re: OCR'ing odd ball languages

Post by gnosis »

Thanks Steve that sounds exactly like what I need.
Post Reply