Scanned 670 pages --> 73 MB, how can I downsize?

Don't know where to start, or stuck on a certain problem? Drop by and tell us about it. Feel like helping others? Start here.

Moderator: peterZ

ncraun
Posts: 11
Joined: 27 Jul 2013, 10:08
E-book readers owned: 1
Number of books owned: 0
Country: USA

Re: Scanned 670 pages --> 73 MB, how can I downsize?

Post by ncraun »

You can certainly reduce the filesize of the book below 73MB, even at 600dpi, especially if the book is only black and white text. For example, I have done a 576 page book at 600dpi with full OCR and bookmarks at a filesize of 7.6mb (5.7mb without OCR and bookmarks). Even though your book is 100 pages bigger, this should not cause a filesize 10 times larger. A quick estimation would calculate an expected size of maybe 9mb for your book.

I don't use Windows or Adobe Acrobat, so I am unable to help you there. However, I have noticed the PDFs I produce are usually smaller than ones generated in Adobe Acrobat.

First, you will want to process your scanned images with Scan Tailor. It is an amazing piece of software that can do incredible work on post processing. You will need to create a project folder with images from your scans named in the order of the pages. Then you can use Scan Tailor to process those images. Make sure the output is set to Black and White. After you have finished processing the scans, you will end up with a folder of TIFFs.

To Bind these scanned images together into a pdf, I would recommend the pdfbeads program. To run it you can just run the command

Code: Select all

pdfbeads *.tiff > output.pdf
. pdfbeads will automatically use JBIG2 compression if you have an appropriate JBIG2 encoder installed. JBIG2 is a compression method that is optimized for black and white (bitonal) images, and can massively reduce the file size. I like to use jbig2enc

After this you will need to use other programs to add OCR and bookmarks. I'd recommend PDF Xchange Viewer for OCR and Jpdfbookmarks for bookmarks.

Now these software packages should run on windows as well. pdfbeads is written in ruby, which has a windows version, PDF Xchange Viewer is a windows software I run under wine, Jpdfbookmarks is a java program, so it is crossplatform, and Scan Tailor offers a Windows build on their site. jbig2enc might be tricky to get working, you'll have to crosscompile with something like MinGW. I can try to help you compile it, but it's been a while since I lasted used windows and had to compile like this, so no guarantees.

If you have questions on this, please ask me. I've also written a more in depth tutorial on this process, and I can post it as soon as I'm done proofreading.
Post Reply