Well, I've had a look at two of your files, Fisher and Bristol...
I wanted to open the pdf files in a text editor to inspect them (not that I have more than a very small sub-set of knowledge of pdf) but the file sizes made that fairly inpractical. I then decided to extract some representative pages to inspect the coding, but as I don't have a modern version of Adobe Acrobat, I had to use another software and assume that the coding of the extracted pages was unaltered, which I think is likely.
Inspecting the code in the extracted pages, I was able to determine that the large number of pages consisting just of text were compressed using JBIG2, which is the most efficient compression method for black and white images supported in the pdf standard. However, I'm not entirely sure whether Acrobat uses lossless encoding, lossy encoding. or offers a choice. I have the user's manual but am not sure if I'd be any the wiser if I looked in it.
The large number of text pages with colour highlighting were efficiently handled, with a relatively small increase in file size due to the colour on the page, which is interesting and shows the power of the pdf format.
The small number of colour images in the above books were encoded using jpeg compression in my extracted pages, but that may be due to a limitation in the software I used which doesn't support jpeg2000 (j2k) compression: that would produce slightly smaller images, but the overall effect on the total file size would be small unless a book had many illustrations.
So, it looks as if your existing files are already about optimally compressed for the content they contain as far as I can see, the only question being whether JPEG2000 compression of colour images is enabled.
To reduce the filesize further, one option would be to downsample the images to a lower resolution, remembering that the text pages are images too, if the quality loss is acceptable. Another would be to test the Adobe Clearscan option, which could produce a substantial filesize reduction while actually improving on the already quite acceptable text quality.
The above assumes that Acrobat doesn't use JBIG2 lossy, which I think is likely: if that option is available but not selected, it should certainly be tried, remembering that like ClearScan it is a lossy process that could in principle replace a misidentified character by a perfect rendering of another character. But the original scans are generally of good quality, except for some curvature of a few text lines, and the nature of the books could probably tolerate rare errors. In the worst case a name or date could possibly be incorrect, though.
Thinking about the potential reduction in filesize that JBIG2 lossy might bring, I've run some quick tests using Abbyy FineReader 12 which has the option. I basically opened one of your pdf files, ran the OCR process to give searchability, and saved the result as a pdf file using a number of different settings. I obtained some reduction in filesize, although not quite as great as I expected, possibly in part due to the file size of the photograph images which remained unchanged. In passing, FR12 produces more accurate OCR than Acrobat, although as your images are of reasonable quality the difference may not be great.
The original Fisher... pdf was 14.2MB and I hoped to at least halve the size, but to do that I had to use almost the maximum JBIG2 lossy setting. At a quick look the text still looks good, but you should inspect it very carefully if it is of interest, looking particularly at smaller characters such as superscripts, accents and the text where the baseline is curved. Incidentally, I OCR’ed with Spanish selected having read your comment above concerning correct recognition of accents.
If Adobe ClearScan doesn’t work for you, bearing in mind the possibility as with JBIG2 lossy that a misrecognised character could be replaced by a perfect copy of another character, you could try using FineReader for any future scans, or for creating a searchable pdf file from camera images. You would have to spend some time learning a new user interface and finding the optimum settings to use, though. Independent tests have shown that FineReader produces more accurate OCR results than Acrobat, although the difference may not great for your scans as the images are good quality. If you start using a camera it could be much more of a consideration.
dtic wrote:Since hard drive space is very, very inexpensive nowadays file size doesn't matter much for many use cases.
I tend to agree with dtic
that with 1TB and 2TB drives currently readily available, and looking to the future, the size of the 20GB of files you have now shouldn’t really be a serious concern...