General PDF/Output Size of a Scanned Book?

Discussions, questions, comments, ideas, and your projects having to do with DIY Book Scanner software. This includes the Stereo Data Maker software for the cameras, post-processing software, utilities, OCR packages, and so on.

Moderator: peterZ

Post Reply
zenofire
Posts: 1
Joined: 29 Mar 2011, 08:17

General PDF/Output Size of a Scanned Book?

Post by zenofire »

What sizes do you guys get when you scan in a complete book? I've been trying to scan my stuff the bonehead way (Using a flatbed scanner connected to my laptop) and reached 90MB with only 40 pages. (Settings: Colour Image / 400DPI) Quite astounding. :|

For example:
100 pages / black and white
100 pages / colour

...

(Just slightly curious because I am extremely interested in digitising books but am worried about the storage/size aspect of the result. I've been digitising local magazines that I have found - I just rip out the spines and slice the pages in half and scan it A4.)
User avatar
Misty
Posts: 481
Joined: 06 Nov 2009, 12:20
Number of books owned: 0
Location: Frozen Wasteland

Re: General PDF/Output Size of a Scanned Book?

Post by Misty »

Magazines are going to be a much more difficult thing for size than books would be. High-resolution colour scans take up an enormous amount of space.

When scanning books, most people tend to convert their scans to binarized form - where there are only two colours, pure black and pure white. It works decently for books that contain only text, though it does mean you lose the original book's appearance. I wouldn't use it for historic books. Depending on the compression you use, you could expect results well under 10MB for 100 black and white pages. A 70-page book PDF I compressed at my previous employer using lossy JBIG2 came out to under 6MB, with OCR embedded - smaller than that with no OCR. Here's a link to the book in question.

For high resolution colour images, there's really not much you can do to help reduce file sizes except to use a more efficient compression - for instance, JPEG2000. Losslessly-compressed high res colour images will always take up large amounts of space and there's not much that can be done; it's a difference between "huge" and "slightly less huge".
The opinions expressed in this post are my own and do not necessarily represent those of the Canadian Museum for Human Rights.
Lazy_Kent
Posts: 37
Joined: 26 Oct 2010, 10:06
Number of books owned: 0
Location: Moscow

Re: General PDF/Output Size of a Scanned Book?

Post by Lazy_Kent »

301 pages / black and white 300 DPI — 4.1 Mb

http://diybookscanner.org/forum/viewtop ... 6482#p6482
User avatar
rob
Posts: 773
Joined: 03 Jun 2009, 13:50
E-book readers owned: iRex iLiad, Kindle 2
Number of books owned: 4000
Country: United States
Location: Maryland, United States
Contact:

Re: General PDF/Output Size of a Scanned Book?

Post by rob »

I second Misty's reply -- if you want color, go with JPEG2000 and then stick all the resulting pages into a PDF. You can use ImageMagick to convert your scanner's output JPEGs to JPEG2000. I took a 3.8M jpeg file and converted to JPEG2000:

convert IMG_0033.JPG -compress JPEG2000 -quality 20 IMG_0033.jp2

The result was 280k, and still looked OK.

--Rob
The Singularity is Near. ~ http://halfbakedmaker.org ~ Follow me as I build the world's first all-mechanical steam-powered computer.
Post Reply