In the beginning when I created pdf files I noticed the embedded bitmaps in pdf files had accuracy problems. These problems occurred due to rasterization, resulting in slightly smaller or larger bitmaps. After long and extensive testing, I wrote software (with AutoIT3, see links below for source code) which uses ImageMagick and Ghostscript to accurately processes bitmaps in pdf or tiff files. From the supplied files, the program creates efficient coded, resized and small pdf files, which fit exactly on a4 page and is ready to distribute or print.
When instead of tiff files, pdf files are supplied a special algorithm in this program helps to get higher accuracy when converting it to tiff format (an intermediate format, necessary for resizing and colorspace conversion). For example, accuracy problems with a pdf file with the exact size of an a4 page (21.0x29.7 cm), holding a bitmap with a resolution of 300 dpi, and 2480x3508 pixels are such that it in theory the bitmap should fit, but with some pdf files the standard tools like Acrobat or ghostscript produce horizontally instead of 2480 pixels, 2479 or 2481 pixels. The solution I took in my program is that I compare the pdf defined internal variable BoundingBox from ghostscript with the variable HiResBoundingBox and compare what a converted output (to tiff) with ImageMagick produces. If page dimensions exceed a threshold of 1 pixel more, or 1 pixel less than what is expected in a standard a4, a special routine will be triggered to force a pdf to tiff conversion, but now with specified amount of pixels. If the pdf page was originally 2480 pixels, this last step may actually produce now the original page with 2480 pixels. If however still a page with 2479 or 2481 pixels will come out, the original in the pdf embedded bitmap had very likely 2479 or 2481 pixels horizontally.
Please see the link to find my typical workflow, the software and source code I use for that and where I incorporate ScanTailor in this.
Explaining scanning workflow: http://www.auditeon.com/software:pdfprocessing
A screencast, showing the steps can be seen here: http://www.auditeon.com/xyz/webcast/Sca ... icDemo.htm
Software to autonomously resize and/or convert an arbitrary pdf or tif file to an exactly a4-sized pdf file: http://www.auditeon.com/software:pdfpro ... stallation
Software to extract tiff files from a pdf file which can be directly imported by ScanTailor: http://www.auditeon.com/software:pdfpro ... tif_300dpi
To reorder pdf pages, if necessary, one can use the application pdfsam.
I hope this helps someone. Maybe the code may inspire others to work further on it.
====== Update information ======
* 22-07-2011: The program has been renamed to MakePDF
* 11-07-2011: The program has been updated to v0.8c with new features:
- * MakePDF now helps to correct wrong placement of odd and even pages. By specifying the -q option (by renaming the MakePDF.exe to MakePDF -q.exe) it will add at the end of the document as many pages as necessary to create a document with a multiple of 4 pages. This can specifically be practical if your target format is a booklet/brochure. (multiple of 4 x a4 pages printed on a double sided A3). As a last manual step, empty pages only needs to be moved to the right location.
* The program also accepts now an option to specify the output resolution (which is default 300dpi). Please see the link above for more information.
- * Direct processing of tiff files into pdf functionality has been added.
* Cleaning up of source code.
* Fixed error which could lead to slightly wrong bitmap dimensions (+/- 1 pixel).