Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

Training, by W. G. George, 1902

A place to tell us about your work and projects. Self-links encouraged!
vitorio
Posts: 138
Joined: 30 Oct 2010, 23:56
Number of books owned: 0
Location: Austin, Texas, USA
Contact:

Re: Training, by W. G. George, 1902

Post by vitorio » 18 Mar 2012, 03:52

vitorio wrote:That last link is a .zip file which includes the missing page 21 (named 366-1 so it should sort between 365 and 366 automatically), plus a few other pages that looked egregiously blurry or out-of-focus, named for the files they should replace.
It turns out there's one more page missing: a blank page, left side, after the left-side green page. If your page numbers start with 1, right side front cover, then it's page 160 that's missing, and you should have pages going from 1 to 164. The end of the book goes: left-side internal page number 134, right-side blank, left-side green page, right-side blank, missing left-side blank, right-side blank with v-shaped tear at the top, left-side pattern, right-side pattern, left-side back cover. Because it's blank, I'm just substituting another left-side blank page, rather than scanning it.

Also, using ImageMagick's "convert" is recompressing the already-compressed JPEG files, losing some unknown amount of quality and adding artifacts. I've been using the lossless "jpegtran" utility to rotate and crop the files, but there isn't anything that can just insert new DPI measurements into a JPEG file without recompressing it (including the ImageMagick "mogrify" command, despite what the manual says), so I'm going to edit the Scan Tailor project manually to set the DPIs per-file next.

vitorio
Posts: 138
Joined: 30 Oct 2010, 23:56
Number of books owned: 0
Location: Austin, Texas, USA
Contact:

Re: Training, by W. G. George, 1902

Post by vitorio » 19 Mar 2012, 02:33

Some notes:
  • I split up each set of images: the two different sets of shots for the right pages, and the strangely different DPI sets for the left pages, and within those, I checked DPIs every ten images, plotted them against the number of images, and manually entered interpolated values for the in-between images into the Scan Tailor project file. Then I measured the retaken images and replaced the DPI for each one of those, too.
  • To deal with the huge signatures in the back of the book, I used 0.5" as my top and bottom margin, and 0.7" as my left and right margin. I measured the margins of each page on-screen with the design decision that the margins should be against the text block, and the images and signatures would bleed into those margins, and adjusted those pages accordingly.
  • This is a great technique:
    rob wrote:Using the same control at the bottom of the thumbnail list in the margin phase, order by height, find the tallest page, make the margins smaller for that page (or check off "same page size"), and keep doing that until you start hitting your "normal" pages. Do the same with width.
  • It seemed like I needed to add at least 10 weight to all of the text.
  • I originally intended on preserving the book covers and all the blank pages, but I couldn't get a full-page image of the cover and interior patterns that I was satisfied with, and when I actually assembled the PDF, it no longer felt necessary.
  • After Scan Tailor was through with the "mixed" pages, I loaded them into Photoshop and saved them as greyscale images to match the character of the rest of the text.
  • I have no idea what I'm doing in Acrobat to assemble these, but I took a stab at it: JPEG2000 images downsampled to 300dpi, JBIG lossy monochrome images (which should be all the text pages) with no downsampling.
  • I want to do a "digitization colophon" before I put the book up anywhere seriously public.
So: Training, PDF draft 6

User avatar
rob
Posts: 773
Joined: 03 Jun 2009, 13:50
E-book readers owned: iRex iLiad, Kindle 2
Number of books owned: 4000
Country: United States
Location: Maryland, United States
Contact:

Re: Training, by W. G. George, 1902

Post by rob » 19 Mar 2012, 18:32

vitorio wrote:Also, using ImageMagick's "convert" is recompressing the already-compressed JPEG files, losing some unknown amount of quality and adding artifacts.
There's always "-quality 100%" which will recompress "with a minimal amount of loss". But using jpegtrain is fine -- just another step in processing, which isn't a big deal if you're using command line tools anyway!
The Singularity is Near. ~ http://halfbakedmaker.org ~ Follow me as I build the world's first all-mechanical steam-powered computer.

User avatar
rob
Posts: 773
Joined: 03 Jun 2009, 13:50
E-book readers owned: iRex iLiad, Kindle 2
Number of books owned: 4000
Country: United States
Location: Maryland, United States
Contact:

Re: Training, by W. G. George, 1902

Post by rob » 19 Mar 2012, 18:38

vitorio wrote:After Scan Tailor was through with the "mixed" pages, I loaded them into Photoshop and saved them as greyscale images to match the character of the rest of the text.
OMG, you have really got to use ImageMagick for batch processing images :) The "-colorspace Gray" option is what you want. I just can't imagine sitting there and doing each image manually in Photoshop, even with a macro.
The Singularity is Near. ~ http://halfbakedmaker.org ~ Follow me as I build the world's first all-mechanical steam-powered computer.

Post Reply