I have lots of texts with English mixed with odd ball languages. OCR'ing the English is no problem but does anyone know of an OCR package that can scan an arbitrary Unicode font set in a mixed environment?
Thanks
OCR'ing odd ball languages
Moderator: peterZ
-
- Posts: 296
- Joined: 27 Nov 2010, 02:26
- E-book readers owned: PRS-505
- Number of books owned: 1250
- Location: Minneapolis, MN
- Contact:
Re: OCR'ing odd ball languages
ABBYY's FineReader is supposed to handle multiple languages pretty well.
This newsletter has some hints on working with multiple language documents:
http://www.abbyydownloads.com/images/ti ... /2011/apr/
They do offer a trial version.
This newsletter has some hints on working with multiple language documents:
http://www.abbyydownloads.com/images/ti ... /2011/apr/
They do offer a trial version.
Steve Devore
BookScanWizard, a flexible book post-processor.
BookScanWizard, a flexible book post-processor.
Re: OCR'ing odd ball languages
Looks like Abbyreader offers a menu of accepted languages but I doubt it offers Coptic which is what I'd like to OCR. Even Microsoft does not offer it as a valid language in Windows. I was hoping there was some OCR package that you could train from scratch, when it sees character A, I tell it its an A, and so on until it learns the characters.
-
- Posts: 296
- Joined: 27 Nov 2010, 02:26
- E-book readers owned: PRS-505
- Number of books owned: 1250
- Location: Minneapolis, MN
- Contact:
Re: OCR'ing odd ball languages
While Coptic isn't built-in to FineReader, it is possible to add it. Under Tools, Options, Document, Document Languages, press "Edit languages". There's a new button, which will allow you to create an additional language based on any character set. So in your case, create a language based on Greek, add the Coptic specific symbols, then when you are ready to scan, under "Read", choose "Train User patterns" to recognize the new symbols.
Steve Devore
BookScanWizard, a flexible book post-processor.
BookScanWizard, a flexible book post-processor.