So I've been scanning several hundred books using simple scan --> scan tailor experimental --> and then using pdfbeads to bind. I was having some issues where the final .pdf was in some cases significantly larger than the original scans, and some cases where the final .pdf was 1/10 of the size of the raw scans. I tracked the issue down to an unexpected interaction between scantailor experimental and pdfbeads with regard to color/gray scale images.
Specifically, my scanned images were being scanned at 600dpi, and the .pngs that simple scan produces are appropriately tagged as 600dpi in the file. After feeding them through scan tailor experimental, the dpi information appears to be stripped out, and the resulting .tif files have no DPI in the file. Pdfbeads, by default, targets a 300DPI quality for output images, but it appears to assume that an input image with no dpi tag has a dpi of 72 (presumably analogous to the way GIMP and imagemagick assume a 72 DPI). I confirmed this by passing pdfbeads a series of different target DPI parameters, then changing the file tags and trying again.
So this led to an issue where scan tailor stripping out the DPI was causing pdfbeads to try to "improve" the image quality of the "72 DPI" images, leading to gigantic filesizes (I had one .pdf produced that was over 1GiB, when the source scans were less than 300MB). To be clear, the missing DPI isn't just an issue for pdfbeads, it also causes issues with imagemagick and some other compression tools I tried.
I wanted to post for two reasons:
1) I spent about a week tinkering around with different tools trying to figure out the issue, before I stumbled onto the solution, and I thought some folks here might find the info useful (the workaround is, if you want a 300dpi output, to pass pdfbeads "-B 72" or to manually set the appropriate DPI in other software, both methods appear to produce the same size output)
2) to ask Tulon if Scan Tailor experimental could have an option to set a file DPI (or at least to preserve existing information instead of stripping it); I understand that the reason DPI settings were removed was due to user confusion, but removing valid DPI info requires that the user be aware of potential downstream consequences, and to manually set the DPI for files in order for other image conversion or bundling tools to work correctly.
BTW, I really appreciate Scan Tailor experimental; it's an immensely useful program. I'm scanning several hundred paperbacks, and I couldn't do it without Scan Tailor. Tulon, I was looking for a way to send a donation your way, but couldn't find anyplace to send it on the github page? If you're accepting contributions, just let me know where to send it.
Scan Tailor Experimental no-DPI causing issues with pdfbeads and imagemagick
Moderator: peterZ
-
- Posts: 79
- Joined: 15 Sep 2010, 15:33
- Number of books owned: 2000
- Country: USA
- Location: Ohio
Re: Scan Tailor Experimental no-DPI causing issues with pdfbeads and imagemagick
alraban,
If you know what the DPI of your images is, you can input that DPI into the files themselves with a quick imagemagick command.
If you are doubling the resolution with scantailor (output stage setting) and have a 1200 DPI final output:
mogrify -density 1200 -units PixelsPerInch *.tif
Of if scantailor is just outputting 1 to 1 for a 600 DPI output:
mogrify -density 600 -units PixelsPerInch *.tif
That will define the resolution of all tifs in your directory. Hopefully then PDFBeads will deal with it properly.
If you know what the DPI of your images is, you can input that DPI into the files themselves with a quick imagemagick command.
If you are doubling the resolution with scantailor (output stage setting) and have a 1200 DPI final output:
mogrify -density 1200 -units PixelsPerInch *.tif
Of if scantailor is just outputting 1 to 1 for a 600 DPI output:
mogrify -density 600 -units PixelsPerInch *.tif
That will define the resolution of all tifs in your directory. Hopefully then PDFBeads will deal with it properly.
Re: Scan Tailor Experimental no-DPI causing issues with pdfbeads and imagemagick
It's not impossible, though I am not working on ST at the moment.alraban wrote: 2) to ask Tulon if Scan Tailor experimental could have an option to set a file DPI (or at least to preserve existing information instead of stripping it);
I used to accept donations, though I no longer do. You see, legally I have to pay taxes on them. At some point the figures went so low that it wasn't even worth the bother.alraban wrote: BTW, I really appreciate Scan Tailor experimental; it's an immensely useful program. I'm scanning several hundred paperbacks, and I couldn't do it without Scan Tailor. Tulon, I was looking for a way to send a donation your way, but couldn't find anyplace to send it on the github page? If you're accepting contributions, just let me know where to send it.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
Re: Scan Tailor Experimental no-DPI causing issues with pdfbeads and imagemagick
I understand completely on both counts, and I appreciate the reply. If you change your mind about donations (or set up a bounty source type thing for feature requests), let me know. Given that I've gotten a tremendous amount of use out of your software (more than some quite expensive scanning related software that shall remain nameless), I'd be glad to kick in $100.Tulon wrote:It's not impossible, though I am not working on ST at the moment.alraban wrote: 2) to ask Tulon if Scan Tailor experimental could have an option to set a file DPI (or at least to preserve existing information instead of stripping it);
I used to accept donations, though I no longer do. You see, legally I have to pay taxes on them. At some point the figures went so low that it wasn't even worth the bother.alraban wrote: BTW, I really appreciate Scan Tailor experimental; it's an immensely useful program. I'm scanning several hundred paperbacks, and I couldn't do it without Scan Tailor. Tulon, I was looking for a way to send a donation your way, but couldn't find anyplace to send it on the github page? If you're accepting contributions, just let me know where to send it.
I try to support open source projects when I can, but I'm not a programmer so all I have to contribute is donations. I've often found, though, that, perhaps unsurprisingly, open source developers aren't often that interested in money
Re: Scan Tailor Experimental no-DPI causing issues with pdfbeads and imagemagick
I used Scan Tailor Experimental for the first a couple of days ago and noticed a problem in a PDF document that I'd created from the TIFS in Adobe Acrobat Pro XI. The OCR function didn't work anymore. After "reconverting" the TIFs to TIF using Irfanview and creating the PDF document again, OCR worked again.
I never had this problem with the regular version of Scan Tailor (which is indeed an immensely useful program that I used hundreds of times for books and magazines). I hope that one day this problem can be fixed. The Experimental version is so much faster than the old version!
I never had this problem with the regular version of Scan Tailor (which is indeed an immensely useful program that I used hundreds of times for books and magazines). I hope that one day this problem can be fixed. The Experimental version is so much faster than the old version!