Re: Learning to Create Tiny DJVU files
Many of the books are half the size once converted with the process outlined here (minidjvu for the black-and-white portions, c44 for any background images at 1/4th size, manually mashed together with djvumake and scripts). I have even seen some that are 1/8th the size of the PDF. Then some are about the same size, and two or three books have been slightly larger. I can't predict which outcome I'll get before the conversion, as there doesn't appear to be any rhyme of reason to it. I have to imagine that sometimes ClearScan builds up lots of redundant font images, which djvu/Jb2 manages to share between pages.
Example: Plato's republic, 397 pages, all black and white except the cover image.
- PDF: 14MB ClearScan
- DJVU with OCR: 6MB
- DJVU without OCR: 4MB
- PDF: 19MB ClearScan
- DJVU with OCR: 11MB
- DJVU without OCR: 8MB
I've generally been dropping the OCR data in the conversion, because when I look at it... it's not that great. ClearScan OCR has tons of mistakes in it, and I don't search in my PDFs that often in the first place. For large books it will save 2 or 3 megabytes to leave it out, so I've been doing that. In the numbers above I gave the with-OCR sizes to make the comparison to the OCR'ed PDFs fair.
[EDIT: I should say all the ClearScan files were created with Acrobat X, and saved as "Reduced Size PDF" afterward. Don't know if Acrobat XI does a better job or not.]