Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

Training, by W. G. George, 1902

A place to tell us about your work and projects. Self-links encouraged!
vitorio
Posts: 138
Joined: 30 Oct 2010, 23:56
Number of books owned: 0
Location: Austin, Texas, USA
Contact:

Training, by W. G. George, 1902

Post by vitorio » 04 Mar 2012, 02:21

A friend of mine is a pretty hardcore fitness-for-fun sort of guy, does those crazy races where you're running through mud carrying a log, that sort of thing, and his interests include barefoot running and proper form for long-distance running and ultramarathoning.

In November 2011, in the New York Times Magazine, Christopher McDougall, who had previously written about his time with the long-distance running Tarahumara Indians of Mexico, writes about a "lost form" of running, invented in the late 1800s by eventual world champion runner W. G. George, called the 100-Up.

George wrote about his technique at the time, and his work eventually made it into the public domain, and it was later reprinted in a book called the Five Kings of Distance, where McDougall found it. However, George's original works aren't preserved in Google Books or in the Internet Archive, and the Kings of Distance is out of print. The handful of copies that were available were immediately sold, or had their prices raised, when the article came out.

George's works are extremely rare, it seems. Only a handful of libraries have them, including my local University of Texas sports library (a copy believed to be from 1913), but I haven't figured out how to transport my scanner safely, nor asked permission to digitize the work yet. (The main library has said okay, but each college or department library is independent.) The only readily available reference at the time of the McDougall article publication was quotations in a 1904 compilation by Eustace Miles titled An Alphabet of Athletics.

Since then, the text of a 1908 work has gone online at hundredup.com, but none of these are equivalent to a proper scan and release of the original work. (I feel McDougall should have done this as part of his article publication.) So:

This copy of Training, from 1902, is George's own compilation of his essays with those of his contemporaries, and was checked out from the University of Chicago library.

It's being scanned in Austin, Texas, USA, with a Sola Technical all-acrylic scanner ordered from Ponoko, in a custom blackout tower, allowing it to be used in a normal, well-lit indoor room. The camera is a Sony Cyber-Shot DSC-W570, mounted on the row of holes closest to the platen, with the lens centered over the book (one hole to the right of the center hole). The camera's settings are 1.9x zoom, Program Auto, 4:3 16M shots, 0 EV, ISO 80, white balance auto set against a white piece of paper in the platen, multi AF focus, multi metering mode, normal smile sensitivity, face detection off, and DRO off. The overhead lamp is a 65 watt, 2000 lumens, GE Reveal indoor floodlight, approximately 36 inches from the book, directly overhead of the spine. The memory card is an Eye-Fi 2GB card. I believe I am achieving approximately 495×464 dpi scans: 4608×3456 pixel shots of a 7×4.5 inch page, minus the room used by the focus targets on either side of the book. (I do not recommend this scanner or this camera, as the all-acrylic scanner is too susceptible to reflections, and the camera does not support manual focus.)

Scans are half done right now, I'll update this thread as I finish the rest and start putting them through Scan Tailor.

User avatar
rob
Posts: 773
Joined: 03 Jun 2009, 13:50
E-book readers owned: iRex iLiad, Kindle 2
Number of books owned: 4000
Country: United States
Location: Maryland, United States
Contact:

Re: Training, by W. G. George, 1902

Post by rob » 04 Mar 2012, 09:58

Nice! Preservation of rare out-of-print books is always laudable. After you scan the book, I hope you will upload it to the Internet Archive. And if you're unhappy with the quality of the scans, you can still upload it, and if you get better scans later, you can upload a second copy.
The Singularity is Near. ~ http://halfbakedmaker.org ~ Follow me as I build the world's first all-mechanical steam-powered computer.

vitorio
Posts: 138
Joined: 30 Oct 2010, 23:56
Number of books owned: 0
Location: Austin, Texas, USA
Contact:

Re: Training, by W. G. George, 1902

Post by vitorio » 12 Mar 2012, 02:32

Even carefully duplicating my configuration across sessions and battery charges, I ended up with three sets of photos, each with a slightly different zoom level (although the camera reported 1.9x each time), so I have three slightly different DPIs.

Also, some of the pages have content that spills into the margins, and I'm not sure how to deal with that in Scan Tailor. I assume I'll ultimately want to generate 300dpi output, rather than scaling it up to 600dpi, but my initial tests have some pages coming out different pixel sizes with differently sized rendered content, which I don't want.

In addition, the book isn't faring very well, so I'm hesitant to try and take all-new photos in a single session.

The photos are around 925MB total. I'm willing to put them online if anyone is interested in experimenting with post-processing them.

I'll be working them over starting this week.

User avatar
rob
Posts: 773
Joined: 03 Jun 2009, 13:50
E-book readers owned: iRex iLiad, Kindle 2
Number of books owned: 4000
Country: United States
Location: Maryland, United States
Contact:

Re: Training, by W. G. George, 1902

Post by rob » 12 Mar 2012, 15:53

I would definitely be interested in trying to postprocess them. Text bleeding into the margins shouldn't be that big a deal -- you might have to manually find the text areas for Scan Tailor, but I think you can play around with the margin settings to get every page the same size.

As for dpi, if you have a range of images at a particular dpi, I can probably run a quick script that will set the correct dpi of the image so that ST has the right dpi to work with. Probably best thing to do is measure some pages' text blocks with a ruler -- vertical and horizontal -- so that I can measure the corresponding image and calculate the right dpi.

For something like this, a good idea would be to get a free Spideroak account. I think you get 2GB of storage. You can then upload all your images, set up a share, email the URL, and whoever gets it can download the images.

If there aren't any color images, then it is best to output at 600 dpi, since it looks much better on screen and when printed. The upscale algorithm ST uses is very good and generates quality output.
The Singularity is Near. ~ http://halfbakedmaker.org ~ Follow me as I build the world's first all-mechanical steam-powered computer.

vitorio
Posts: 138
Joined: 30 Oct 2010, 23:56
Number of books owned: 0
Location: Austin, Texas, USA
Contact:

Re: Training, by W. G. George, 1902

Post by vitorio » 12 Mar 2012, 21:51

There aren't any color images, they're black-and-white, but I don't see how to tell Scan Tailor to make them grayscale while also keeping the text as monochrome. Guess that's a post-post-processing step.

Here's a PDF of six pages. Specifying approximate DPIs for the three sets gets me pretty close, but if you flip between pages 2 and 6 you'll see the page number at the top and the margins are off by, I dunno, 10-20 pixels from each other. Maybe I can just get the DPI precisely right and it'll just work.

I'll start uploading the originals tomorrow.

User avatar
Heelgrasper
Posts: 70
Joined: 19 Feb 2012, 21:04
E-book readers owned: None
Number of books owned: 500
Location: Randers, Denmark

Re: Training, by W. G. George, 1902

Post by Heelgrasper » 13 Mar 2012, 06:28

You might find this tutorial to ScanTailor helpful if you don't know it already: http://vimeo.com/12524529
---
Jakob Øhlenschlæger
Randers, Denmark

The past is a foreign country: they do things differently there
L. P. Hartley

vitorio
Posts: 138
Joined: 30 Oct 2010, 23:56
Number of books owned: 0
Location: Austin, Texas, USA
Contact:

Re: Training, by W. G. George, 1902

Post by vitorio » 13 Mar 2012, 20:39

Here are the scans if you'd like to try playing with them yourself:
The page dimensions are 7×4.5 inches.

User avatar
rob
Posts: 773
Joined: 03 Jun 2009, 13:50
E-book readers owned: iRex iLiad, Kindle 2
Number of books owned: 4000
Country: United States
Location: Maryland, United States
Contact:

Re: Training, by W. G. George, 1902

Post by rob » 13 Mar 2012, 22:00

vitorio wrote:Specifying approximate DPIs for the three sets gets me pretty close, but if you flip between pages 2 and 6 you'll see the page number at the top and the margins are off by, I dunno, 10-20 pixels from each other. Maybe I can just get the DPI precisely right and it'll just work.
Well, it depends. Page 2 (number 36, an even page) and page 6 (number 99, an odd page) are from different cameras, so they will definitely have slightly different DPIs.

So given the pages are 7 inches long, I should be able to measure the pixels for the page at various points from the first image to the last image, and do a quick binary search to find exactly where the DPI changed. Unfortunately, the 4.5 inches doesn't help, because that would be to the spine, which you usually can't see clearly in the image. But if we have at least one dimension, that should be enough.

Ideally, though, you want to measure the text block itself -- from the baseline of the first line to the baseline of the last line. Then, even if the pages have slightly different sizes (not unheard of), your output will be perfect -- in fact, better than the printed version.

I'm downloading the images now, so we'll see what I can come up with.
The Singularity is Near. ~ http://halfbakedmaker.org ~ Follow me as I build the world's first all-mechanical steam-powered computer.

User avatar
rob
Posts: 773
Joined: 03 Jun 2009, 13:50
E-book readers owned: iRex iLiad, Kindle 2
Number of books owned: 4000
Country: United States
Location: Maryland, United States
Contact:

Re: Training, by W. G. George, 1902

Post by rob » 14 Mar 2012, 00:03

Okay, so first I looked at the right images, and I measured the page edge using Photoshop. I measured as best I could, since in most cases it's difficult to discern the edge of the page. This is why I like to use text for measurement :)

Anyway, I found that the DPI for the right image hardly varied at all, well within the bounds of measurement error. I got 450 dpi.

For the left images, I found that between DSC00282 and DSC00283 there was a jump in size. DSC00282 and prior were 468 dpi, while DSC00283 and on were 492 dpi.

So first I stuck the low-dpi left images into a separate directory, and ran this ImageMagick command:

Code: Select all

for i in *.JPG; do convert $i -rotate -90 -density 468x468 +repage rotated/$i; done
and then for the high-dpi left image, I did the same except with 492x492.

For the right images, the same except -rotate 90 and density 450x450. Now I had right/rotated images and left/rotated images.

Next, I renamed the images to 001.jpg, 002.jpg, and so on, interleaving right with left. The right images come first, so the first right image became 001.jpg, the next 003.jpg, and so on. I use an OSX program called Name Mangler to do this.

Next thing was to crop out those focus targets. I chose a random page (they should all be the same size, roughly), and determined that for odd pages, 618 to 3972 (so a vertical size of 3354), and even pages, 612 to 4056 (so a vertical size of 3444) would be kept:

Code: Select all

for i in *[13579].JPG; do echo $i; convert $i -crop 0x3354+0+618 +repage cropped/$i; done
for i in *[02468].JPG; do echo $i; convert $i -crop 0x3444+0+612 +repage cropped/$i; done
Okay, now ScanTailor has clean, oriented, sized images to work with. I told ST that all the pages should be 1.5 pages, and to deskew everything. Then I had to go to a few of the pages without text (and the one page with the library stamp which wasn't straight) and straighten out the pages manually.

Next, I let ST select content. This is, of course, where cropping out the focus targets helped immensely. I scrolled through the pages, and corrected those pages that looked wrong. One key to understand is that ST will output a uniform book page size that is no smaller than the largest content area. So any page that looks like it has too much content selected must be corrected.

Next, I let ST run margins. You want most of the pages to have "top" alignment because the page numbers on your pages are at the top. This way the book will look uniform. For pages that don't start at the top, you have to manually fix the alignment. For example, the beginnings of chapters have no page numbers, and should align to the bottom.

(to be continued: it's bedtime!)
The Singularity is Near. ~ http://halfbakedmaker.org ~ Follow me as I build the world's first all-mechanical steam-powered computer.

vitorio
Posts: 138
Joined: 30 Oct 2010, 23:56
Number of books owned: 0
Location: Austin, Texas, USA
Contact:

Re: Training, by W. G. George, 1902

Post by vitorio » 14 Mar 2012, 01:58

rob wrote:Okay, so first I looked at the right images, and I measured the page edge using Photoshop. I measured as best I could, since in most cases it's difficult to discern the edge of the page. This is why I like to use text for measurement
Ah, I get it.

The full-page text blocks, from the top of the page number to the bottom of the descender on the last line (measured on pages 20, 21, 120 and 121), seem to be 13.9cm tall and 7.7cm wide.

From baseline of the first line of body text (not the page number) to baseline of the last line (measured on pages 20, 21, 92 and 93, as 120 and 121 seem to have slightly different typesetting), it's 12.8cm tall.

I notice you're not differentiating between width DPI and height DPI. Surely they're different? I tried specifying different ones in Scan Tailor, though, and it stretched the pages out.

I feel like what I really want Scan Tailor to do is ignore the DPI and just output the content blocks at whatever scale necessary to make them all the same pixel dimensions. :)
rob wrote:I told ST that all the pages should be 1.5 pages, and to deskew everything.
What does "1.5 pages" mean? Is this the page split type?
rob wrote:Next, I let ST select content. … Next, I let ST run margins.
OHHH. So, you're hitting the batch process play button first, and then correcting issues, rather than doing them one at a time? That sounds so much faster.

Post Reply