Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

HOW TO COMPRESS PDF FILES

Don't know where to start, or stuck on a certain problem? Drop by and tell us about it. Feel like helping others? Start here.
LMB

HOW TO COMPRESS PDF FILES

Post by LMB » 07 Jan 2011, 21:32

Hi,

My sincere congratulations for this wonderful site. I discovered your work just a few days ago and I was astonished with the marvelous photographic scanners you build.

As I saw in the internet archives pdf files are compressed in such a way that when you opened them with an OCR software like Abby Fine Reader 10 Professional, and rebuild them again they acquire an huge size. My question is: what do you do or what software do you use to compress so efficiently those digitalized files?

As I told you I use Abby Fine Reader 10 and the only way to compress a scanned book in pdf (the most used format) is to transform it in a black and white version or gray with a medium quality image. This is better than nothing but I need to discover a more efficient way to make this.

I'll appreciate all the help you can give me about this topic
Many thanks and all the luck in the world for you and Diy Book Scanner project.
LMB

spamsickle
Posts: 596
Joined: 06 Jun 2009, 23:57

Re: HOW TO COMPRESS PDF FILES

Post by spamsickle » 08 Jan 2011, 10:46

I don't know what Internet Archive is doing, but my guess would be that they're building a PDF from an existing text file. Since you wouldn't need to open such a file in OCR software (the text would already be there), my guess is probably incorrect.

I use Adobe Acrobat version 9 or higher when I want compression. It has a "Clearscan" option, which creates a custom font and vectorizes the text with it, at the same time that it does OCR. For my purposes, the OCR is acceptable -- I'm not using a text-to-voice application to read to me, just doing an occasionally search on the generated text. Clearscan also produces smoother looking characters than the original Scan Tailor output, and works better with the Scan Tailor output than it does with the original JPEGs.

LMB

Re: HOW TO COMPRESS PDF FILES

Post by LMB » 08 Jan 2011, 18:50

Hi
Thank you very much for your answer. The «clearscan» function of Adobe you are talking about it's a full version, it is not? I'm saying this beacause I don't find this function in the Adobe 9 version I'm using.

many thanks one more time

spamsickle
Posts: 596
Joined: 06 Jun 2009, 23:57

Re: HOW TO COMPRESS PDF FILES

Post by spamsickle » 08 Jan 2011, 19:17

I think it's in all versions, but I might be wrong. In the version I'm using, you click on the "Document" menu, then "OCR Text Recognition -> Recognize Text Using OCR". That should display a popup, with settings. If the PDF Output Style is not Clearscan, click "Edit" and select that as the output style.

If that doesn't get it for you, I probably can't help further. It may be that you don't have it, but you should direct your question to Adobe before giving up.

Mandor
Posts: 24
Joined: 28 Jul 2009, 01:27
E-book readers owned: lBook V8, lBook V3
Number of books owned: 0
Location: Sofia, Bulgaria

Re: HOW TO COMPRESS PDF FILES

Post by Mandor » 10 Jan 2011, 03:08

@LMB
The size of PDF files, produced by Abbyy FineReader depends on export settings — only text&pictures, all page + text over… At least two of options are to produce PDF, where all pages are graphic representation of whole page, plus OCR-ed text — over or below this image.

emmerick

Re: HOW TO COMPRESS PDF FILES

Post by emmerick » 17 Jan 2011, 06:37

What is the average size of the PDF file of the books scanned by you? My scanned a book of 700 pages is around 90 to 100 megas. Would decrease it? Thanks

User avatar
Gerard
Posts: 154
Joined: 17 Oct 2010, 07:15
Number of books owned: 0
Location: Berlin (Germany)

Re: HOW TO COMPRESS PDF FILES

Post by Gerard » 17 Jan 2011, 08:42


emmerick

Re: HOW TO COMPRESS PDF FILES

Post by emmerick » 17 Jan 2011, 08:50


This is for linux. im use windows :( Thanks

emmerick

Re: HOW TO COMPRESS PDF FILES

Post by emmerick » 17 Jan 2011, 13:24

I was doing some testing here and I concluded: PDF really compress the quality is too bad the only way to be perfect is to pass the OCR I used Abby 10. A file that previously was 100 mega got 2 megs after OCR is the only thing that's a little more work because the header and footer and a few words that do not recognize.

User avatar
mellow-yellow
Posts: 46
Joined: 28 Jun 2010, 13:33
Number of books owned: 1
Country: USA
Location: Portland, OR, USA
Contact:

Re: HOW TO COMPRESS PDF FILES

Post by mellow-yellow » 17 Jan 2011, 15:49

A few options:

1. Adobe Acrobat (excluding the free Reader): http://www.websiteoptimization.com/spee ... mizer.html
2. Omnipage or ABBYY: Export or Save your PDF without images (text only). Of course, OCR errors reduce legibility.
3. Source images: Reduce source file resolution, convert color to B/W or Grayscale, reduce # of color/grayscal values
4. Print to a PDF with PDFCreator or equivalent

Post Reply