Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

Scan Tailor Experimental no-DPI causing issues with pdfbeads and imagemagick

Scan Tailor specific announcements, releases, workflows, tips, etc. NO FEATURE REQUESTS IN THIS FORUM, please.
Post Reply
alraban
Posts: 3
Joined: 17 Aug 2016, 08:08
Number of books owned: 1500
Country: USA

Scan Tailor Experimental no-DPI causing issues with pdfbeads and imagemagick

Post by alraban » 17 Aug 2016, 12:45

So I've been scanning several hundred books using simple scan --> scan tailor experimental --> and then using pdfbeads to bind. I was having some issues where the final .pdf was in some cases significantly larger than the original scans, and some cases where the final .pdf was 1/10 of the size of the raw scans. I tracked the issue down to an unexpected interaction between scantailor experimental and pdfbeads with regard to color/gray scale images.

Specifically, my scanned images were being scanned at 600dpi, and the .pngs that simple scan produces are appropriately tagged as 600dpi in the file. After feeding them through scan tailor experimental, the dpi information appears to be stripped out, and the resulting .tif files have no DPI in the file. Pdfbeads, by default, targets a 300DPI quality for output images, but it appears to assume that an input image with no dpi tag has a dpi of 72 (presumably analogous to the way GIMP and imagemagick assume a 72 DPI). I confirmed this by passing pdfbeads a series of different target DPI parameters, then changing the file tags and trying again.

So this led to an issue where scan tailor stripping out the DPI was causing pdfbeads to try to "improve" the image quality of the "72 DPI" images, leading to gigantic filesizes (I had one .pdf produced that was over 1GiB, when the source scans were less than 300MB). To be clear, the missing DPI isn't just an issue for pdfbeads, it also causes issues with imagemagick and some other compression tools I tried.

I wanted to post for two reasons:

1) I spent about a week tinkering around with different tools trying to figure out the issue, before I stumbled onto the solution, and I thought some folks here might find the info useful (the workaround is, if you want a 300dpi output, to pass pdfbeads "-B 72" or to manually set the appropriate DPI in other software, both methods appear to produce the same size output)
2) to ask Tulon if Scan Tailor experimental could have an option to set a file DPI (or at least to preserve existing information instead of stripping it); I understand that the reason DPI settings were removed was due to user confusion, but removing valid DPI info requires that the user be aware of potential downstream consequences, and to manually set the DPI for files in order for other image conversion or bundling tools to work correctly.

BTW, I really appreciate Scan Tailor experimental; it's an immensely useful program. I'm scanning several hundred paperbacks, and I couldn't do it without Scan Tailor. Tulon, I was looking for a way to send a donation your way, but couldn't find anyplace to send it on the github page? If you're accepting contributions, just let me know where to send it.

abmartin
Posts: 79
Joined: 15 Sep 2010, 15:33
Number of books owned: 2000
Country: USA
Location: Ohio

Re: Scan Tailor Experimental no-DPI causing issues with pdfbeads and imagemagick

Post by abmartin » 25 Aug 2016, 19:54

alraban,

If you know what the DPI of your images is, you can input that DPI into the files themselves with a quick imagemagick command.

If you are doubling the resolution with scantailor (output stage setting) and have a 1200 DPI final output:
mogrify -density 1200 -units PixelsPerInch *.tif

Of if scantailor is just outputting 1 to 1 for a 600 DPI output:
mogrify -density 600 -units PixelsPerInch *.tif

That will define the resolution of all tifs in your directory. Hopefully then PDFBeads will deal with it properly.

Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: Scan Tailor Experimental no-DPI causing issues with pdfbeads and imagemagick

Post by Tulon » 06 Sep 2016, 10:15

alraban wrote: 2) to ask Tulon if Scan Tailor experimental could have an option to set a file DPI (or at least to preserve existing information instead of stripping it);
It's not impossible, though I am not working on ST at the moment.
alraban wrote: BTW, I really appreciate Scan Tailor experimental; it's an immensely useful program. I'm scanning several hundred paperbacks, and I couldn't do it without Scan Tailor. Tulon, I was looking for a way to send a donation your way, but couldn't find anyplace to send it on the github page? If you're accepting contributions, just let me know where to send it.
I used to accept donations, though I no longer do. You see, legally I have to pay taxes on them. At some point the figures went so low that it wasn't even worth the bother.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.

alraban
Posts: 3
Joined: 17 Aug 2016, 08:08
Number of books owned: 1500
Country: USA

Re: Scan Tailor Experimental no-DPI causing issues with pdfbeads and imagemagick

Post by alraban » 07 Sep 2016, 23:25

Tulon wrote:
alraban wrote: 2) to ask Tulon if Scan Tailor experimental could have an option to set a file DPI (or at least to preserve existing information instead of stripping it);
It's not impossible, though I am not working on ST at the moment.
alraban wrote: BTW, I really appreciate Scan Tailor experimental; it's an immensely useful program. I'm scanning several hundred paperbacks, and I couldn't do it without Scan Tailor. Tulon, I was looking for a way to send a donation your way, but couldn't find anyplace to send it on the github page? If you're accepting contributions, just let me know where to send it.
I used to accept donations, though I no longer do. You see, legally I have to pay taxes on them. At some point the figures went so low that it wasn't even worth the bother.
I understand completely on both counts, and I appreciate the reply. If you change your mind about donations (or set up a bounty source type thing for feature requests), let me know. Given that I've gotten a tremendous amount of use out of your software (more than some quite expensive scanning related software that shall remain nameless), I'd be glad to kick in $100.

I try to support open source projects when I can, but I'm not a programmer so all I have to contribute is donations. I've often found, though, that, perhaps unsurprisingly, open source developers aren't often that interested in money :-)

ymmv
Posts: 1
Joined: 24 Oct 2016, 08:38
Number of books owned: 0
Country: Netherlands

Re: Scan Tailor Experimental no-DPI causing issues with pdfbeads and imagemagick

Post by ymmv » 26 Oct 2016, 04:45

I used Scan Tailor Experimental for the first a couple of days ago and noticed a problem in a PDF document that I'd created from the TIFS in Adobe Acrobat Pro XI. The OCR function didn't work anymore. After "reconverting" the TIFs to TIF using Irfanview and creating the PDF document again, OCR worked again.

I never had this problem with the regular version of Scan Tailor (which is indeed an immensely useful program that I used hundreds of times for books and magazines). I hope that one day this problem can be fixed. The Experimental version is so much faster than the old version!

Post Reply