By inflating the file size, I suppose I'm referring to what the file size would be if all the text were one font, e.g. Times New Roman. If that were the case, there would only be one embedded font, which would mean a substantially low file size. Running the document through Acrobat X's OCR/ClearScan results in the creation of MANY fonts, which results in a much larger file size. After you Clear Scan your document, look in the document properties under Fonts and you'll see what I'm talking about. in the example below, my most recent book, I have the font properties of the 84pg book I did today. This 84pg book put through Acrobat X's OCR/ClearScan resulted in 255 created fonts since, apparently, Clear Scan didn't recognize the font or some anomalies resulting from the post processing.spamsickle wrote:I'm happy to share, but it sounds like what you're doing and what I'm doing are like apples and oranges. I've never seen Clearscan inflate my filesize, but I'm not starting with filesizes anything close to what you're seeing.
Now, don't get me wrong, I am very happy with Clear Scan's results, especially since it renders a very accurate account of the text - haven't run across any misspellings yet. However, I believe that if the font was uniform, the 5.6MB file would be even smaller. In fact, I tried editing the text to change the created fonts to something like Times New Roman, and it resulted in lots of misspellings as well as some format changes. If that's my only other option, I'll live with (what I consider) a larger file size. I'm just trying to streamline as much as possible since I'd like to carry many of my books on my 32GB iPad.
My most recent example: I start with an 84-page book (including a full-color front cover; I usually don't worry about the back cover) which is 225MB of JPEGs. It's mostly text, with two greyscale images, no b/w images. The front cover is in full color.spamsickle wrote:To take a recent example: I start with a 430-page book (including front and back covers) which is 1 GB of JPEGs. It's mostly text, with a few greyscale images and a few more b/w images. The front and back covers are in color.
After it's been through Scan Tailor, it's 572 MB of TIFs. Everything is mixed mode except the covers, which are color.
ImageMagick's mogrify converts from TIF to 532 MB of PDFs, which PDFTK stitches into a 532 MB book.
Running that through Clearscan gives me a 19 MB book.
1) I first take the 225MB of JPGs and pre-process them in Adobe Lightroom 3 and export as TIFs, which yields 831MB of TIFs.
2) I import that 831MB of TIFs into ScanTailor and output 46MB of TIFs
3) I import that 46MB of TIFs from ScanTailor into Acrobat X, Clear Scan it, and add the 2.5MB JPG color cover (48.5MB) to yield a 5.6MB PDF.
Granted...considering those numbers, a 5.6MB PDF is a drop in the bucket, but I know on the other hand that if those fonts were uniform, the file size would be even lower. (No way to get there yet, though.) So I'm happy with what I have for the moment...just trying to make it better and smaller.
That full-color front and back cover sure does seem large. Is it TIF or JPG (surely not JPG, huh?). I tweak my full-color cover in LR3 and export as a high quality JPG, which usually results in an image between 2-5MB. I then insert that as the cover page of the PDF at the very end of my workflow and save everything to PDF once again and I'm done. What makes your images so large? If it is a TIF, you might try JPG. Seems to work fine for me. You can download my latest book here if you'd like to get a look at the final product.spamsickle wrote:Now, granted that's more than 236 pages, but if 15 MB seems large to you, I don't think we're on the same page. I'd be interested in hearing what your process and data sizes are down the line too; I may be doing something stupid. One thing I'm doing that you may not be is preserving a completely full-color image of the front and back cover. My front cover accounts for 53 MB, and back cover 31 MB of my uncompressed book.
Someone should mention to Daniel that there should be a place where we can upload completed works just to download and get a look at so we can ask each other questions about processing, workflow, etc.
For those purposes, you've got all you need. However, as I continue to convert my actual library into a virtual library on my iPad, space/filesize becomes a concern.spamsickle wrote:I know PDFs created from the ground up, with a single font and I assume some professional compression tweaking on the images can come in under 5 MB. While that would be nice, I'm not trying to carry a library on my phone, so 20 or even 50 MB is satisfactory for me. All my books are on a hard drive or a DVD, and even at 50 MB that means I can put 80 books on a single disk.