Reduce pdf file size

Don't know where to start, or stuck on a certain problem? Drop by and tell us about it. Feel like helping others? Start here.

Moderator: peterZ

cday
Posts: 451
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: Reduce pdf file size

Post by cday »

0kelvin wrote:How much quality do I lose by using maximum compression of black and white pages?
Black and white images (text and line drawings, but not halftone shades of gray or photographs) compress very well compared with grayscale or colour images as very small file sizes can be obtained if the images are saved as TIFFs with CCITT G4 ('fax') compression, or using the newer but less widely supported JBIG2 compression.

And both types of compression are actually lossless, so there is absolutely no loss of quality despite the very small file sizes that can be obtained compared with files of grayscale or colour images. Those compression methods can be applied directly to image files (although free-standing JBIG2 files are not widely used) or to images contained in a PDF file, which generally adds very little overhead to the total size of the images contained in the file, although OCR'ing the file to make the text contained searchable naturally increases the final file size.

Edit:

JBIG2 compression exists in lossless and lossy versions so naturally only the lossless version is lossless; the lossy version can sometimes shrink files to really small sizes, although when high compression settings are used, at some risk of the misidentification of characters, leading to the possibility of incorrect characters being substituted in the text.
0kelvin
Posts: 29
Joined: 10 Nov 2012, 17:14
Number of books owned: 0
Country: Brazil

Re: Reduce pdf file size

Post by 0kelvin »

Seems that Finereader lacks an option to choose between compression algorithms, there is only lossless or lossy to choose from. When choice is lossy, it says JBIG2 and CCIT.
cday
Posts: 451
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: Reduce pdf file size

Post by cday »

[Deleted pending further investigation].
cday
Posts: 451
Joined: 19 Mar 2013, 14:55
Number of books owned: 0
Country: UK

Re: Reduce pdf file size

Post by cday »

0kelvin wrote:Seems that Finereader lacks an option to choose between compression algorithms, there is only lossless or lossy to choose from. When choice is lossy, it says JBIG2 and CCIT.
FineReader 10 Tools > Options... > Save > PDF allowed specific compression methods to be set for black and white:

FR10_Options.png
FR10_Options.png (14.13 KiB) Viewed 9502 times
FR10_CCITT G4.png
FR10_CCITT G4.png (11.13 KiB) Viewed 9502 times
FR10_JBIG2.png
FR10_JBIG2.png (11.03 KiB) Viewed 9502 times
FineReader 12 has a modified interface based on whether or not compression must be lossless:

FR12_Lossless.png
FR12_Lossless.png (17.77 KiB) Viewed 9502 times
FR12_Lossy.png
FR12_Lossy.png (18.8 KiB) Viewed 9502 times
The interface change seems likely intended to make use of a fairly complex program easier, but one effect appears to be to slightly reduce control over the compression method actually used when a PDF file is created.

When alternative compression methods are shown as in the above screenshots without any means of selecting a particular method, FineReader may possibly automatically determine and then use whichever method results in the smaller file size. For example, while JBIG2-lossless probably generally produces smaller file sizes for black and white images than CCITT G4, there may be occasions when CCITT G4 compression is more efficient, so that method may then be automatically selected. It's only conjecture that FineReader operates that way, but it possibly makes sense of the interface change.

But as the object is to produce the optimum PDF file for the content, the actual compression method used really doesn't matter to the end user, as the resulting file will be viewable in any standard PDF viewer.
0kelvin
Posts: 29
Joined: 10 Nov 2012, 17:14
Number of books owned: 0
Country: Brazil

Re: Reduce pdf file size

Post by 0kelvin »

I made a quick test to test something here. Finereader is multi process, is can read multiple pages at the same time, it's much faster than Acrobat Pro. So what if I create the PDF in Finereader and leave Acrobat to do the clearscan? Acrobat imports pages one by one, being horribly slow.

Randomly choose 11 pages from a Fluid Mechanics book scanned a 300 dpi, interpolated to 600 dpi in scantailor. All pages were 1bit, no photos.

Using Acrobat to import 11 pages, save as PDF compression JBIG2 lossy and then clearscan outputted 470 Kb.

Using Finereader to import 11 pages, save as PDF quality 10%, then opened in Acrobat to do clearscan, outputted 378 Kb.

So Finereader is faster and compresses more than Acrobat.
0kelvin
Posts: 29
Joined: 10 Nov 2012, 17:14
Number of books owned: 0
Country: Brazil

Re: Reduce pdf file size

Post by 0kelvin »

MRC compression with 10% quality yields a PDF that is extremely slow to render.
muscleriot
Posts: 4
Joined: 18 Nov 2015, 14:14
E-book readers owned: tablets
Number of books owned: 600
Country: UK

Re: Reduce pdf file size

Post by muscleriot »

Basically ; to massively reduce the size of PDFs you need to OCR the text images in a page and have that text mapped to one of the 6 native PDF fonttypes, while preserving images.
(Every PDF reader has a set of 6 or so fonts which it doesn't have to load (i.e. be embedded) from inside the PDF document.)

The only program I have found that does this well is Nuance's Omnipage which produces very small pdf sizes typically 2mb-5mb for 300 page books with images. Its OCR is far better at handling bent and skewed page edge text as well. You can get it on a 15 day trial.
Post Reply