huge output files?

Scan Tailor specific announcements, releases, workflows, tips, etc. NO FEATURE REQUESTS IN THIS FORUM, please.

Moderator: peterZ

matt

huge output files?

Post by matt »

Hi all,

Are other folks seeing a really huge output TIFF files (eg 100-140MB each) from Scan Tailor? I'm fearing that that is just the nature of uncompressed TIFF files and that I'm just going to have to make a bunch of spare space (eg 100+ GB) in order to process some of these large 800-1200 page textbooks I'm archiving.

For some context, my latest experimental workflow has been to take my initial (2MB) JPGs and run them through a preprocessing imagemagick script which rotates, sharpens, and normalizes the contrast a bit. This ends up turning original 2MB JPGS into 4-6MB TIFFs (in order to minimize compression losses). It is these TIFFs that I'm sending to Scan Tailor.

Would I be better off with a different approach? Any suggestions on ways to minimize this explosion in file size throughout the process?

Thanks!

Matt

Code: Select all

-rw-r--r-- 1 matt staff 140M May 30 12:31 merge_0050-right-IMG_0025.tif
-rw-r--r-- 1 matt staff 130M Jun  1 04:54 merge_0761-left-IMG_0381.tif
-rw-r--r-- 1 matt staff 125M May 30 14:03 merge_0291-left-IMG_0146.tif
-rw-r--r-- 1 matt staff 124M Jun  1 03:23 merge_0489-left-IMG_0245.tif
-rw-r--r-- 1 matt staff 123M Jun  1 03:46 merge_0559-left-IMG_0280.tif
-rw-r--r-- 1 matt staff 122M Jun  1 03:12 merge_0455-left-IMG_0228.tif
-rw-r--r-- 1 matt staff 121M May 30 12:32 merge_0052-right-IMG_0026.tif
-rw-r--r-- 1 matt staff 121M May 30 14:17 merge_0335-left-IMG_0168.tif
-rw-r--r-- 1 matt staff 120M Jun  1 03:15 merge_0465-left-IMG_0233.tif
-rw-r--r-- 1 matt staff 120M Jun  1 03:17 merge_0471-left-IMG_0236.tif
-rw-r--r-- 1 matt staff 120M Jun  1 04:44 merge_0733-left-IMG_0367.tif
-rw-r--r-- 1 matt staff 120M Jun  1 02:56 merge_0407-left-IMG_0204.tif
-rw-r--r-- 1 matt staff 119M May 30 14:26 merge_0363-left-IMG_0182.tif
User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: huge output files?

Post by Misty »

A few questions -

The main one is what mode you're using in Scan Tailor. Black and white, greyscale/colour, or mixed?

Are you scanning with a flatbed scanner, or a camera? If it's a flatbed, what DPI are you using? What DPI is Scan Tailor set to output?
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
matt

Re: huge output files?

Post by matt »

As I'm working mostly with books with lots of graphics/photos, I'm using Scan Tailor in color/mixed mode. I'm acquiring images using a pair of Canon A560 cameras driven by CHDK intervalometer (time-lapse) scripts; the cameras output 7.1 megapixel JPGs.
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: huge output files?

Post by Tulon »

The output of Scan Tailor was intended to be an intermediate step before DjVu or PDF encoding. As such, its size wouldn't matter. Basically, you shouldn't store Scan Tailor output as is.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
matt

Re: huge output files?

Post by matt »

Tulon wrote:The output of Scan Tailor was intended to be an intermediate step before DjVu or PDF encoding. As such, its size wouldn't matter. Basically, you shouldn't store Scan Tailor output as is.
Yes, the next step for my Scan Tailor output is PDF (with OCR). However as I'm on a laptop without tons of storage, it's a little bit of a challenge to maintain 120GB free for the temporary files needed to process a large book via ST. (In addition, I notice that the 100+ MB TIFFs cause most of my editing/manipulation programs to bog down quite a bit in terms of load/save time).

I'm curious why the file size goes from a 4-6MB TIFF (from imagemagick) and comes out as a 100+MB TIFF from ST (especially given that ST is using LZW compression on its output)?

Experimenting a bit I see that changing my output setting to 300 DPI (from 600 DPI) reduced the file size significantly with no obvious degradation in quality (if anything I notice fewer speckles with the 300 DPI version). Anyone have any suggestions/observations regarding this situation?

Thanks!
User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: huge output files?

Post by Misty »

Scan Tailor's output DPI works by interpolating the black and white portions of the image, e.g. the text. It does a great job of creating smooth, detailed text. What it doesn't do is add any detail to images; those just get plain bilinear interpolation, which makes them take up more hard drive space. It sounds like you've been setting Scan Tailor to a DPI much, much higher than your original camera photos and, because so many pages have images, that's making each page take up much more space.

Measure the DPI of your original camera scans. You can do that by measuring the height of your page, then finding out how many pixels tall the page is in the scan; divide the pixels by the number in inches to get DPI.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: huge output files?

Post by Tulon »

The numbers still don't add up. I just did a quick test with 600 DPI output in Color / Grayscale mode. I got a 12MB file. If the input file would be color rather than grayscale, I suppose I would've got a 36MB file. In the very worst case, if a picture would be covering the whole page, I suppose I would've got like 60MB file. I think the DPI of Matt's input files is wrong (too low). I can take a look if you give me a sample file.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
matt

Re: huge output files?

Post by matt »

Just tried again, specifically setting the DPI to 180 (as reported by my camera). Output size from ST was about 30MB for 300DPI and about 114MB for 600DPI. The original image is here: http://grab.by/4IjX and two small related screenshots are at http://grab.by/4IjY and http://grab.by/4IjX
Tulon
Posts: 687
Joined: 03 Oct 2009, 06:13
Number of books owned: 0
Location: London, UK
Contact:

Re: huge output files?

Post by Tulon »

A camera can't possibly give you the correct DPI, as it doesn't know the distance to the object.
Scan Tailor experimental doesn't output 96 DPI images. It's just what your software shows when DPI information is missing. Usually what you get is input DPI times the resolution enhancement factor.
User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: huge output files?

Post by Misty »

The DPI the camera gave you is simply a default value for printing. It doesn't apply to the DPI of the object you're photographing. To get the actual DPI of the book, use the formula I posted above.
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
Post Reply