Laser Scanning Results

DIY Book Scanner Skunk Works. Share your crazy ideas and novel approaches. Home of the "3D structure of a book" thread.

Moderator: peterZ

duerig
Posts: 388
Joined: 01 Jun 2014, 17:04
Number of books owned: 1000
Country: United States of America

Re: Laser Scanning Results

Post by duerig »

One other workaround occurs to me. We are aligning line-focused lasers here. And that means that we could align (and turn on) multiple lasers along the same axis. So if the laser just isn't bright enough, we could use a pair of lasers aligned along each axis instead of just one. That would double the brightness.
dtic
Posts: 464
Joined: 06 Mar 2010, 18:03

Re: Laser Scanning Results

Post by dtic »

This may be early but do you have any ideas on how these new laser methods could move things toward the "holy grail" of automated scanning? I'm thinking that in manual use the delay from "a laser step" may in worst case be roughly the same as the delay of raising/lowering a platen. So no time save. But if a laser setup can make automation happen on the other hand... The japanese research on high speed laser scanning https://www.youtube.com/watch?v=03ccxwN ... -eQcsv7nZg has a pretty simple setup. Maybe you can team up with jck57 and build something like that (book curved, on a tilted plane, actuator that releases one page at a time, a fan that blows on pages, lasers) but without the super high speed.
duerig
Posts: 388
Joined: 01 Jun 2014, 17:04
Number of books owned: 1000
Country: United States of America

Re: Laser Scanning Results

Post by duerig »

Currently in manual use, the laser shot does double the time cost. As you mentioned before, we can probably remove that penalty with CHDK. And there is an equivalent in gphoto that will let me take the extra photo in just 200 milliseconds. This is something I'll have to figure out as I move from testing to trying to make it ready for 'real' scanning.

I had not thought of how this would be used in an automated setup. But if you have a page-turning apparatus of any kind, it should work just fine. Removing the need for a platen seems like a potential win there. And as Daniel has said in the past, once you have full auto then it doesn't matter how long each shot takes so I wouldn't even bother with trying for a super high speed setup.
dtic
Posts: 464
Joined: 06 Mar 2010, 18:03

Re: Laser Scanning Results

Post by dtic »

duerig wrote:if you have a page-turning apparatus of any kind, it should work just fine.
jck57 has several very impressive prototypes and keeps working on it. I'll PM him in in case he hasn't seen this thread.
duerig
Posts: 388
Joined: 01 Jun 2014, 17:04
Number of books owned: 1000
Country: United States of America

Re: Laser Scanning Results

Post by duerig »

One more quick update. Since I've got this running so fast now on my laptop (1m5s for 4 double-page scans), I decided to try running it on a Raspberry Pi to see if it would be feasible to process on the Pi and not just capture.

Alas, it got killed for consuming too much memory after about 3 minutes. So for now, you will need a separate PC to run the processing on. I am sure there are many ways to economize on memory in the current script, so maybe this will become possible in the future.
dpc
Posts: 379
Joined: 01 Apr 2011, 18:05
Number of books owned: 0
Location: Issaquah, WA

Re: Laser Scanning Results

Post by dpc »

Help me to understand the benefit of the lasers. I had thought that by using the laser and subsequent image this removed the need for a lot of the other scanner pieces since you're turning pages by hand and there's no need for a platen? You should be able to get much higher throughput by hand turning pages without having to raise and lower a platen and that would be the big advantage of your post-processing. I believe this is what Google did with some of their early book digitization efforts (i.e. hand turning and later removing thumbs). You lose some of this improved throughput by having to shoot twice and turning the lights on and off for each shot may actually have some negative effects like shortening the bulb life.

I looked at doing post-processed page flattening a few years ago but didn't use lasers. I pointed a third camera at the edge of the book against a green background to get a profile of the curvature for each set of pages I shot from above. (Imagine a camera at your belly button pointed toward the bottom edge of the book - there are posts about this method in the forums here somewhere.) The thumbs were removed by simply block filling a rect along the outside margins. The #1 reason that I chose to not pursue this method was image quality. Flattening the page gives much better image results for what I need. I'm not trying to scan 10,000 books so the added work of raising/lowering the platen in order to get a higher quality image was worth the trouble to me.

The processed images you've posted, while very impressive considering the curved source image, are still below the quality of image you'd get if the page were flattened under glass. If the end goal is OCR'ed text, then it may not matter, but if you're trying to archive the contents of the book accurately and produce something like a PDF, the platen-based scanners produce more acceptable results. If you're thinking of using your current method with a fully automated page-turning scanner that may take hours to scan a book (i.e. time is not a factor), then why wouldn't you use a platen based scanner to acquire higher quality images?

I'm not trying to be negative here. I'm just trying to understand what advantages your method has over some of the others. Please don't let my comments deter you from your research. It's fascinating to watch others here solve shared problems in novel ways and I've enjoyed reading this thread as you've charted your progress.
duerig
Posts: 388
Joined: 01 Jun 2014, 17:04
Number of books owned: 1000
Country: United States of America

Re: Laser Scanning Results

Post by duerig »

There are four kinds of benefits that you might be able to get with this kind of scanning when compared with a standard scanner:

Low-cost: A laser-based scanner involves a lot fewer pieces compared to a standard scanner like the Archivist. This means that it is easier to put one together and cheaper to produce. While the Archivist kit is $1000 right now, an equivalent laser-scanner kit might end up being much less expensive. If somebody does manage to make a full-auto scanner, a laser-based version of it might also be cheaper than a more complicated platen design.

Portability: Given the lack of platen, it might be possible to make a better portable scanner. This is partly because it won't be as fragile and partly because it reduces overall complexity of the physical build. There are portable builds, but we might not having to worry about the platen might open up new possibilities here.

Automatic post-processing: Thus far when scanning, I have found that post-processing is more time-intensive than actually photographing the book. Current post-processing software like ST or BSW involves a fair amount of manual intervention. I've used ST a lot and it is quite accurate when it comes to finding the spine or cropping to content. But there are still enough things that need manual correction that it takes a long time for me to make the book look as good as I'd like. So one thing I hope to achieve with the lasers is to make the whole process from the time you stop photographing to the PDF or DJVU file totally automated with no need to intervene manually.

Small, Close-bound, or Delicate Books: I have a lot of small paperback books that are almost impossible to scan with a normal platen. Every time you lift the platen, the book closes on its own and you need to find your place to scan the next page. Even the old scanner with 'paperback mode' looks very cumbersome to scan this way. I'd much rather be able to hold it open and turn pages to scan. I've also had a few books where the inside margins end up very close to the spine of the book. With these books, the place where the glass sides of the platen meet can interfere with capturing the text. Finally, even a v-shaped platen can be relatively hard on a book because it is being repeatedly pressed against the glass. For my more delicate books, I am hesitant to scan them this way. Overall, I expect the laser-scanning method to work better with small books and a platen-based method to work better with large books. And I think that there will be a good amount of overlap as they are both decent at capturing medium-sized books.

I'm optimistic enough about at least some of these benefits becoming a reality, that I am excited about pursuing this line of inquiry.

Regarding quality, I think that there will be a lot of variation. The quality I've been able to achieve is good enough that I would be happy to have all my books scanned like this. But it is quite possible that the best platen-based scans are always better than the laser-based scans and should be used when quality is most important.
duerig
Posts: 388
Joined: 01 Jun 2014, 17:04
Number of books owned: 1000
Country: United States of America

Re: Laser Scanning Results

Post by duerig »

In the current test rig, I was hopeful that the guide laser would be enough to end skew problems. But after a larger test batch, I see that it is not sufficient. Steadying the book with my hands, even with a guide laser was not very precise. I was off by more than a degree on several scans which caused noticeable skew in the text. I will have to think about how I can handle both deskewing and dewarping in a single step since it doesn't work to do one and then the other independently.

Here are some angle measures from my test (-90 degrees is perfectly straight):

Dewarping images/ad/015.jpg
-89.3729850369
Dewarping images/ad/013.jpg
-90.5085869381
Dewarping images/ad/005.jpg
-88.5262292314
Dewarping images/ad/008.jpg
-89.4167580543
Dewarping images/ad/002.jpg
-90.7327906115
Dewarping images/ad/007.jpg
-89.2234188966
Dewarping images/ad/003.jpg
-89.8483740767
Dewarping images/ad/022.jpg
-91.1634098318
Dewarping images/ad/012.jpg
-90.6253042807
Dewarping images/ad/011.jpg
-90.0779799172
Dewarping images/ad/019.jpg
-90.1178519685
Dewarping images/ad/009.jpg
-89.9222581488
Dewarping images/ad/017.jpg
-90.3139463348
Dewarping images/ad/010.jpg
-90.3506668324
Dewarping images/ad/004.jpg
-88.8759374655
Dewarping images/ad/021.jpg
-89.9598347213
Dewarping images/ad/016.jpg
-89.5287636387
Dewarping images/ad/014.jpg
-90.2347372849
Dewarping images/ad/020.jpg
-89.5588091579
Dewarping images/ad/006.jpg
-88.5247316511
duerig
Posts: 388
Joined: 01 Jun 2014, 17:04
Number of books owned: 1000
Country: United States of America

Re: Laser Scanning Results

Post by duerig »

I've managed to adapt the technique, adding three layers of deskewing.

First, I detect from a background-laser shot (with no book) at the beginning exactly how much the lasers are skewed relative to the camera. I then use this to apply a tiny pre-deskew to everything.

Second, during dewarping, I deskew based on spine angle, dewarp to a non-horizontal basis line, and correct for the deskewing in the laser derivative calculation.

Finally, I run a content-based deskewing after the dewarp to compensate for any minor text skew remaining from sloppy bookbinding or a page held slightly crooked relative to the spine.

Here is a pure-deskew test where I scanned the same page at various skewed angles (and perfectly straight) to test the technique: https://www.flickr.com/photos/126962164 ... 636681754/

Here is a larger-scale scanned book set with 36 pages of a small book scanned. Except for possible binarization or sharpening/contrast tweaks, these seem ready to bind together into a pdf: https://www.flickr.com/photos/126962164 ... 640380493/
duerig
Posts: 388
Joined: 01 Jun 2014, 17:04
Number of books owned: 1000
Country: United States of America

Re: Laser Scanning Results

Post by duerig »

I scanned the first complete book with the new laser techniques. The new taller frame meant that the light was much more even and so it was pretty amenable to binarization. I don't have the space to upload the two gigabytes of raw images, but here is the entire book binarized into an 11 MB .djvu file:

https://files.app.net/wxms0ua5k

I want to try it out on a modern book with more regular typeface and better binding next. I'll look around for something that is in the public domain.
Post Reply