Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

HOW TO COMPRESS PDF FILES

Don't know where to start, or stuck on a certain problem? Drop by and tell us about it. Feel like helping others? Start here.
User avatar
reggilbert
Posts: 49
Joined: 28 Sep 2010, 19:57
Number of books owned: 3000
Location: Buffalo, New York

Re: HOW TO COMPRESS PDF FILES

Post by reggilbert » 18 Jan 2011, 22:31

emmerick wrote:What is the average size of the PDF file of the books scanned by you? My scanned a book of 700 pages is around 90 to 100 megas. Would decrease it?
The following draws on flatbed experience but might help with camera-based scanning. The software that comes with my scanner permits a choice of several output formats. The default is JPEG. Unfortunately, Acrobat compiles JPEGs very inefficiently -- the resulting file sizes are like the one you mention, emmerick -- up to 100MB for, say, only 500 images (an image often contains two pages on a flatbed) -- and that is just for b&w images. TIFF images aggregated poorly as well.

But for some reason the BMP format works far better in Acrobat, with no apparent reduction in resolution, either before, as a scan output choice, or after, as an Acrobat input choice. I think BMP is an uncompressed or optionally minimally compressed format, so maybe Acrobat has a lot to work with. I just scanned an 850-page book (425 images) in 300-dpi greyscale (greyscale seems to work better for Acrobat OCR) and Acrobat brought them all in, plus added its OCR, for a total of output size of 40MB. Keep in mind that the average source image is 8MB and you can enlarge the resulting PDF pages to 400 percent with virtually no loss of sharpness. I find that very impressive. B&w would have been 7MB or so.

This information may or may not be of any use to camera scanners. I don't believe cameras have a BMP option, and as far as I can tell in tests just now, Scan Tailor does not accept or output BMP images, so I assume it cannot output them either. (But I could swear Scan Tailor was able to do so a couple months ago, when I tested its page-splitting power - it did a great job. And those had to be BMPs, but I can't find the test anymore.)

So if cameras don't put out BMP and in any case Scan Tailor does not (again, that may be incorrect), that leaves conversion of another format, camera source files or Scan Tailor output, to BMP. But that could lose some resolution and may result in huge files anyway.

On the other hand, if you have RAW source files, which have to be converted to something in any case, and do not need to use Scan Tailor (if cropping and OCR is all you need, Acrobat can handle that), then maybe conversion to BMP and the aggregation of the resulting images in Acrobat could be an option for creating smaller Acrobat books.

emmerick

Re: HOW TO COMPRESS PDF FILES

Post by emmerick » 19 Jan 2011, 06:14

reggilbert wrote:
emmerick wrote:What is the average size of the PDF file of the books scanned by you? My scanned a book of 700 pages is around 90 to 100 megas. Would decrease it?
The following draws on flatbed experience but might help with camera-based scanning. The software that comes with my scanner permits a choice of several output formats. The default is JPEG. Unfortunately, Acrobat compiles JPEGs very inefficiently -- the resulting file sizes are like the one you mention, emmerick -- up to 100MB for, say, only 500 images (an image often contains two pages on a flatbed) -- and that is just for b&w images. TIFF images aggregated poorly as well.

But for some reason the BMP format works far better in Acrobat, with no apparent reduction in resolution, either before, as a scan output choice, or after, as an Acrobat input choice. I think BMP is an uncompressed or optionally minimally compressed format, so maybe Acrobat has a lot to work with. I just scanned an 850-page book (425 images) in 300-dpi greyscale (greyscale seems to work better for Acrobat OCR) and Acrobat brought them all in, plus added its OCR, for a total of output size of 40MB. Keep in mind that the average source image is 8MB and you can enlarge the resulting PDF pages to 400 percent with virtually no loss of sharpness. I find that very impressive. B&w would have been 7MB or so.

This information may or may not be of any use to camera scanners. I don't believe cameras have a BMP option, and as far as I can tell in tests just now, Scan Tailor does not accept or output BMP images, so I assume it cannot output them either. (But I could swear Scan Tailor was able to do so a couple months ago, when I tested its page-splitting power - it did a great job. And those had to be BMPs, but I can't find the test anymore.)

So if cameras don't put out BMP and in any case Scan Tailor does not (again, that may be incorrect), that leaves conversion of another format, camera source files or Scan Tailor output, to BMP. But that could lose some resolution and may result in huge files anyway.

On the other hand, if you have RAW source files, which have to be converted to something in any case, and do not need to use Scan Tailor (if cropping and OCR is all you need, Acrobat can handle that), then maybe conversion to BMP and the aggregation of the resulting images in Acrobat could be an option for creating smaller Acrobat books.

Good morning friend, thanks for the tips, but my camera just has more output JPG will do some testing here. Thanks. A program to convert JPG to BMP image would have the same result without loss? worth a try for testing.

emmerick

Re: HOW TO COMPRESS PDF FILES

Post by emmerick » 19 Jan 2011, 08:35

I reduced a PDF file of 100 megs to 40 megs in Adobe Acrobat X, going to file, open the PDF file, then go to save as PDF and optimize the options as they are leaving. Here at least reduced from 100 to 40 megs and the quality was still very good. Change the options to see if they get something better, because not quite understand those options.

Image

seasalt

Re: HOW TO COMPRESS PDF FILES

Post by seasalt » 02 Jun 2011, 07:47

on mac
I've been trying different things:
best so far is:

eg
scan images original was jpeg 300dpi
front and back cover - full colour 48bit
rest of pages bw
pages 360

20mb OCR PDF or 78mb OCR PDF both got:
OCR tool was abbyy express (layers images/text)

acrobat x - save as reduce file size option is 7.6mb
acrobat x - save as optimize = 9.9mb

then if I can get homebrew installed to I stall PDFbeads I am hoping to get under 5mb

rubypdf

Re: HOW TO COMPRESS PDF FILES

Post by rubypdf » 04 Oct 2011, 12:33

emmerick wrote:

This is for linux. im use windows :( Thanks

I have done some efforts on windows version pdfsizeopt.

Post Reply