OCR'ing odd ball languages

gnosis · Post by **gnosis** » 24 Apr 2011, 10:39

I have lots of texts with English mixed with odd ball languages. OCR'ing the English is no problem but does anyone know of an OCR package that can scan an arbitrary Unicode font set in a mixed environment?

Thanks

steve1066d · Post by **steve1066d** » 25 Apr 2011, 17:45

ABBYY's FineReader is supposed to handle multiple languages pretty well.

This newsletter has some hints on working with multiple language documents:

http://www.abbyydownloads.com/images/ti ... /2011/apr/

They do offer a trial version.

gnosis · Post by **gnosis** » 26 Apr 2011, 12:03

Looks like Abbyreader offers a menu of accepted languages but I doubt it offers Coptic which is what I'd like to OCR. Even Microsoft does not offer it as a valid language in Windows. I was hoping there was some OCR package that you could train from scratch, when it sees character A, I tell it its an A, and so on until it learns the characters.

steve1066d · Post by **steve1066d** » 26 Apr 2011, 14:23

While Coptic isn't built-in to FineReader, it is possible to add it. Under Tools, Options, Document, Document Languages, press "Edit languages". There's a new button, which will allow you to create an additional language based on any character set. So in your case, create a language based on Greek, add the Coptic specific symbols, then when you are ready to scan, under "Read", choose "Train User patterns" to recognize the new symbols.

gnosis · Post by **gnosis** » 27 Apr 2011, 08:03

Thanks Steve that sounds exactly like what I need.

DIY Book Scanner

OCR'ing odd ball languages

OCR'ing odd ball languages

Re: OCR'ing odd ball languages

Re: OCR'ing odd ball languages

Re: OCR'ing odd ball languages

Re: OCR'ing odd ball languages