Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

ABBYY for compression?

Convert page images into searchable text. Talk about software, techniques, and new developments here.
Post Reply
Posts: 64
Joined: 03 Sep 2010, 13:23
Number of books owned: 0
Location: Calgary, Alberta, Canada

ABBYY for compression?

Post by recaptcha » 15 Jun 2020, 14:49

I heard that ABBYY Fine Reader does a better job at compression than Acrobat. I’ve been using Acrobat for OCR and making compressed PDFs and have not been satisfied with the results.

Q 1: I was wondering if I could use ABBYY to re-compress my existing PDFs, or if I would have to use ABBYY to create the PDFs from scratch ?

Q 2: Also, is ABBYY SPRINT that comes bundled with document scanners, able to compress PDFs just as well as the full version of ABBYY?

Posts: 71
Joined: 14 May 2014, 23:17
Number of books owned: 500
Country: Australia

Re: ABBYY for compression?

Post by BruceG » 16 Jun 2020, 04:08

I am happy to run some tests for you. I use Acrobat, ABBYY Sprint and ABBYY Finereader.
I use Acrobat for croping and turning to start off. Then at the end to index a group of pdfs.
If I only want a simple (no editing of any kind) ocr I use Sprint. It is also used to reduce the size of the pdf before using Finereader.
Depending what sort of out put you want Finereader to do, it allows pdfs to be saved in three ways which effect file size. Text on top, Image on top or Text & picture (ie. no image). So there are a number of variations.
How you would like to use the material often dictates what is the best process to use.
Looks like you are using Image on top and text not seen. To see what the ocr is like I often save with text on top to see if editing is required.
If you can save a project to something like dropbox I am happy process it in different ways to compare the compression rates.
A recent newspaper 351 Pages 60dpi project - % reduction on scans approx only
Scans 1048 meg
Sprint12 529 meg - 50%
Text on Top 386 meg - 35%
Image on Top 202 meg - 20%
Text & Picture 23 meg - 2%

Posts: 71
Joined: 14 May 2014, 23:17
Number of books owned: 500
Country: Australia

Re: ABBYY for compression?

Post by BruceG » 16 Jun 2020, 08:00

I was interested to see if there was any space saving by doing Sprint12 first before using Finereader.
I was scanning 6 months of a periodical today, so used that as my test case.
365 pages Colour dpi 600 size between A4 & A3

Scans 2850 Mb
Sprint then Finereader Text on top 382.782 Mb 13.4%
Sprint then Finereader Image on top 73.38 Mb 2.57%
Sprint then Finereader Text & Picture 9.336 Mb 0.32%

Finereader Text on top 392.931 Mb 13.78%
Finereader Image on top 72.95 Mb 2.55%
Finereader Text & Pictue 8.684 Mb 0.3%

In this the time taken using Sprint first does not result in much saving.
This publication was from 1864 so was 99.99% text to start with.
No editing at all.

Would be a good exercise doing something with a number of pictures to compare with. Also grayscale or B&W.

Posts: 246
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: ABBYY for compression?

Post by cday » 16 Jun 2020, 11:38

I think that file size comparisons in relation to different workflows would need to be interpreted carefully, given the large number of variables, particularly in relation to compression options but also potentially in relation to source image colour mode and file formats.

Given the quality of Abbyy recognition software in general, I imagine that Sprint is an excellent product when included with a scanner, although it will lack many of the advanced features available in the full FineReader software, which may or may not be needed in practice. However, the recognition accuracy although no doubt excellent on good quality scans, is unlikely to reach the standard of the latest FineReader version when processing more difficult images.

On the question of whether there could be a benefit in using Sprint as a step before using FineReader, I really can't see why that would be worthwhile, any seeming benefit in a particular case probably arising from the particular combination of settings used in the two programs. Probably best to invest time in understanding the many options in FineReader and then go straight to that. If file size is important as it often is, there is quite a lot to know about the optimum file format, colour mode and compression option to use, and FineReader supports some more advanced options.

Regarding Adobe Acrobat, the recognition accuracy in versions up to maybe Acrobat 9 or possibly 10 was known to be significantly below that of FineReader or Nuance OmniPage, but I believe it is probably much better in later versions. Where Acrobat really scores is when processing high quality scans using the Clearscan option, which as far as possible outputs to vectorised text giving both excellent image quality and potentially very small file sizes. The images loaded do need to be good quality images at 300 DPI or more, more the domain of flatbed scans. Confusingly, Adobe has dropped the Clearscan name in the latest Acrobat version although the option is still included.

Posts: 15
Joined: 13 Aug 2012, 09:08
E-book readers owned: Kindle - multiple platforms
Number of books owned: 1000
Country: United States

Re: ABBYY for compression?

Post by glenleslie » 23 Aug 2020, 23:39

I've noticed significant differences in OCR outcomes between Sprint versions (mainly using Sprint v9 ... the update log for v12 doesn't say anything OCR improvements) and Finereader. I have Finereader 12 pro and the differences in recurring words in the text are quite significant. Even though Sprint recognizes all text regions in the document , the OCR of that text is not the same as Finereader. I also have Sprint 6 and it has the same OCR misses on the same scans.

My test was with 75 pages scanned 2 up from a small paperback book as 300DPI Grayscale JPGs -- not the greatest but more than sufficient for text. The pages which were slightly skewed (page angled down/up or trapezoidal) in the scan seemed to have the greatest impact on OCR misses by the Sprint version.

p.s. you'll see my separate question about how to fix Finereader pages which have been split from this 2 up format (e.g. 2 facing pages of a scanned book) -- Finereader splits them into individual pages but struggling to get them to then eliminate page edge scans and center the text.

Post Reply