Version of ABBYY?

General discussion about software packages and releases, new software you've found, and threads by programmers and script writers.

Moderator: peterZ

Post Reply
ai4px
Posts: 33
Joined: 12 Dec 2012, 12:47
Number of books owned: 0
Country: United States

Version of ABBYY?

Post by ai4px »

Is there a particular version of ABBYY that is suited for assembling my JPGS to PDF and doing OCR? There seem to be a lot of versions of their software.
mrwarper
Posts: 18
Joined: 29 Dec 2012, 21:50
E-book readers owned: 10x iRex DR1000, 15x iRex DR800
Number of books owned: 10000
Country: Spain
Contact:

Re: Version of ABBYY?

Post by mrwarper »

I used ABBYY 7 for ages, because it was the only one that did a decent OCR job, even if it left insane amounts of post-processing to the user (such as de-hyphenating words split at line breaks) for which I wrote custom scripts anyway. Every other OCR I tested yielded inferior results, and every version after 7 was bloated, much slower and not significantly better in any way that was meaningful to me.

All of that changed with ABBY 11. It's still heavier on the machine, but it is finally what I would call 'suitable for human consumption', and now I don't need most of my scripts. I'm not sure about what you mean by 'assembling my JPGS to PDF and doing OCR', though. I do that, but I am peculiar so I'm not sure you mean that exactly. I currently assemble PDFs from images with IrfanView but I'm writing more custom software to eventually phase it out and do it from a command line prompt.

HTH.
stearn
Posts: 18
Joined: 22 Dec 2011, 20:00
E-book readers owned: kindle
Number of books owned: 4000
Location: Nr. London, UK

Re: Version of ABBYY?

Post by stearn »

Finereader allows you to load a series of JPEGS (or other image files), OCR, and then save to various file formats - PDF with the text embedded is one of them. There are various settings to the PDFs so you can play around with them until you hit the output size that suits.

Acrobat X will allow you to batch process image files so that you produce text embedded PDFs but it will only work as individual files, so if you want a folder of JPEGS ending up as one multi-page PDF you will have to assemble them - easy enough to do in Acrobat, but another step.

For OCR I would always allow the software to work from the image files as converting to any other format before performing OCR may add in extra levels of compression and artefacts and reduce the quality of the OCR.
Post Reply