Methods To Sense The 3D Surface/Structure Of A Book

Post by **daniel_reetz** » 05 Jan 2011, 07:35

This thread is about the various ways we could "see" the 3D structure of a book, so we could potentially do perfect dewarping. I'm particularly interested in recovering the 3D shape of a book, not just flat documents, so this list will be heavily biased in that direction. For people unfamiliar with the topic, dewarping is taking an image of a curved page or of an entire book, and using some algorithm to make the image "flat" -- as though it had been scanned on a perfectly flat surface. Right now, DIY Book Scanners can be considered "dewarping in hardware", because the flat platen glass flattens the page.

I've long maintained that image-based dewarping is a flawed solution, because books are all so different. However, with the advent of improved algorithms in Scan Tailor and direct 3D sensing like the Microsoft Kinect/PrimeSensor, I've changed my mind and think it's time to take a fresh look at dewarping technologies. Eventually, I would like this thread to be a canonical resource on the topic, so I will continuously update this post as I learn new things. Also, this is a hardcore topic involving math, cameras, computer science, etc and so it has significant academic interest. As a result, some of the best information is locked up in academic journals, and some of the reading will be hard. For an overview of the state of the art of dewarping (as of a few years ago) see this document. To see where things have gone since then, see this Google Search.

Because this post is to be an information resource for the community, I'd prefer that comments in this particular thread be informational. That means that if you know of another technique or program, please post it and I will probably add it to the list. If you are wondering if dewarping is a good idea, please post that somewhere else. To be clear, comments on the specifics of an algorithm, your ideas and so on, are absolutely requested and desired, and if you want to start working on one of these here, go right ahead. BUT if you want to talk about these things in some general sense, let's do that somewhere else.

Edit: Updated Google Scholar link, thanks for the help, Mark Main.

Post by **daniel_reetz** » 05 Jan 2011, 07:36

Feel free to comment with new ideas or better resources.

1. Look at the lines of text or borders of images on a page and extract the page curvature from them.

Apps that do this:
Scan Tailor
(is it still using coupled snakes? )
I know there are other examples of this technique, does anyone remember?

For flat documents, there is a similar approach by unpaper.

2. Look at the borders of the book and extract curvature that way.

Apps that do this:
Atiz Snapter.
As far as I can see, Snapter is currently unavailable. My personal testing found it to be totally unreliable.

3. Project an infrared pattern on the page, photograph it in infrared, in stereo, convert stereo IR information to 3D, and then dewarp.

Apps that do this: None publicly available. This is the Google Books scanning method.
Article about their technique.

Examples (all of Google Books)

4. Use two cameras to photograph both pages with overlapping information. Use this stereo pair to determine 3D structure for dewarping.

Apps that do this: Decapod (it's not clear that there is presently any implementation of this in Decapod, but it was the original idea). From their wiki, it appears that right now the two cameras are treated independently and "calibration" consists of simply rotating each camera into position.

5. Using the Kinect for direct depth sensing of the book surface.

Apps that do this: Not exactly an app, but the libfreenect/OpenKinect driver gives the depth image.
Rob proposed the idea here and I got the first few depth images of books here -- there's a long way to go on this project and we could use a little help to see if the data straight from the device are worthwhile. It may also be possible to get a close-range PrimeSensor. I will be contacting PrimeSense to feel out the possibilities.

6. Using Sharp sensors for extracting the curvature at several lines on a page.

Spamsickle proposed this here and though at first, I didn't like the idea, after discussing it more with Spam and Rob, I have come to really like it, it is simple, efficient, and might work (if the Sharp sensors weren't so awfully noisy/messy). I have the Sharp sensors laying around in a box and just need to build a rig for testing. The idea right now is to have a rod extending over the book with two of these sensors. By sweeping them across the surface of the book, you'd get the distance exactly.

7. Using a laser line to get a reliable line to follow for dewarping.

A laser pointer or diode can easily be made into a laser line by using a cylinder lens to expand the beam. The laser line, when projected on the book surface, distorts according to the page curvature. Using this laser line, we should be able to make a good guess at the 3D structure of the page and do dewarping. Or perhaps we could make a modified version of Scan Tailor that searches for bright lines. In any case, it is a promising area of research suggested by many including Rob, myself, and Vitorio.

I decided to try this out this morning (got up at 1AM, couldn't sleep!) and the results looked very promising.

I didn't have any cylinder lenses laying around (aaghhh!!!), so what I did was took a piece of "turning film" from the back of a cellphone display and put it in front of the laser pointer.

Laser pointer by itself:

Laser pointer plus turning film.

Then, I pointed the laser, from the side, toward the book. From straight down, obviously the laser beam will appear straight. However, if we project it from the side, we get something like this (actually this is two photographs of two projections superimposed on each other):

Laser image by itself (it's noisy because I used the wrong camera settings but didn't care to take the image a second time)

Image of the book:

Laser beams superimposed on book:

high res images.

OK, the laser beam is not perfect because of the nature of turning film. A brighter laser with a better lens would give much better results. If you had two lasers, you could take just two shots -- a laser beam shot, and a normal shot. Using the info from the two, you could obviously dewarp the page. I think this method is a winner. Cheap, handy, uses a single camera and a handful of solid state parts. Books which can lay flat are easy targets -- not so sure about books in a cradle (that's up next).

8. Using depth-from-defocus.

This technique is a bit subtle. Essentially it makes the assumption that what is in focus in a picture with shallow DoF is all in one plane. By shifting the the focus through a scene, the depth of each object can be recovered by watching for high frequency information. Unfortunately this method suffers for compact cameras because they do not have shallow DoF, and it fails in general because not all book pages contain high frequency content. An additional problem is that it requires many photographs of a page to work. EVEN SO, I was very, very excited to see Gerard try out this technique here, with the help of Spamsickle. They did some great work, and I hope we end up trying all of these to at least that kind of level.

9. Using a coded aperture camera.

There is a new field called "computational photography" and many of the imaging schemes for CP inherently recover a depth map. Coded aperture imaging is explained here. I am building a coded aperture camera for other reasons, but honestly expect the depth resolution to be too coarse for book scanning.

10. Using RGB lighting to get the curvature of the book.

This is an idea I had just a week or so ago. If you mix a red, green, and blue light, you get white. White light is nice for scanning books, so we're already +1. Now, if you put your lights at different points in space, when you interrupt them, you will get colored shadows. In this way, you can make colored shadows that reflect the shape of the book edge, and also identify the orientation of the lighting relative to the book. I think pictures show this idea best, so I mocked it up in Maya:

11. Difference-based lighting. Use light control to get better depth information from photographs.

Humans use the direction of light as a cue to depth. Most of our scanning rigs have two or more lights. There's no reason we can't use these lights in a smarter way to get better depth information. In particular, I'm thinking of Anonymous's page splitter idea. The same idea has been proposed under numerous guises before, but I think it would work a lot better if we made better use of the lights.

So imagine that we have two lights.

Turn the left one on.

Then turn the right one on.

Now take the difference between the two -- the page edges are clearly highlighted:

Now, you can make a virtual third light. Add the left and right images:

Looks pretty good!

Now you can play all kinds of games. Add the difference of each back to the original image, or something - edges and the center become highlighted.

Screwing around with contrast and stuff can get you even better data:

etc etc. The nice thing is that these are all easy to control (it's easy to switch lights on and off) it's only two shots per capture, and the image math is all dead-simple to start with, just addition and subtraction.

Here are the original images if you'd like to play with them.

vitorio · Post by **vitorio** » 05 Jan 2011, 12:46

daniel_reetz wrote:1. Look at the lines of text or borders of images on a page and extract the page curvature from them.

I've been a little concerned with this one; I'll be doing design and art books where I'm not sure this will work.

daniel_reetz wrote:4. Use two cameras to photograph both pages with overlapping information. Use this stereo pair to determine 3D structure for dewarping.

A recent paper made this look easy, like they were reconstructing from arbitrary angles as long as they could see the four bounding corners in both.

daniel_reetz wrote:6. Using Sharp sensors for extracting the curvature at several lines on a page.

Doesn't this mean you have to move the sensors across the surface at a known rate for a known distance? Seems like it'd be tricky to do reliably across multiple types of builds, compared to a fixed sensor like a laser or a Kinect.

daniel_reetz wrote:7. Using a laser line to get a reliable line to follow for dewarping.

Hooray! That does look promising. You say you'd need two cameras; why? Why not take two photos in succession from a single camera, one with the lasers on, one with them off? Surely that could be triggered electronically.

daniel_reetz wrote:9. Using a coded aperture camera.

Is this like using light field (plenoptic) cameras?

daniel_reetz wrote:10. Using RGB lighting to get the curvature of the book.

This is also neat. Is this made easier by LED lighting? Either fixed color LEDs or programmatic control of RGB LEDs?

Post by **daniel_reetz** » 05 Jan 2011, 14:30

vitorio wrote:
daniel_reetz wrote:4. Use two cameras to photograph both pages with overlapping information. Use this stereo pair to determine 3D structure for dewarping.
A recent paper made this look easy, like they were reconstructing from arbitrary angles as long as they could see the four bounding corners in both.

Do you have that reference handy? We have people who have offered to liberate papers.

vitorio wrote:
daniel_reetz wrote:6. Using Sharp sensors for extracting the curvature at several lines on a page.
Doesn't this mean you have to move the sensors across the surface at a known rate for a known distance? Seems like it'd be tricky to do reliably across multiple types of builds, compared to a fixed sensor like a laser or a Kinect.

Probably, but mounting a little sensor on a servo is not so hard. Calibration might be less than fun, but making something move in a circle or line, though less attractive than other solutions, is certainly not all that hard.

vitorio wrote:
daniel_reetz wrote:7. Using a laser line to get a reliable line to follow for dewarping.
Hooray! That does look promising. You say you'd need two cameras; why? Why not take two photos in succession from a single camera, one with the lasers on, one with them off? Surely that could be triggered electronically.

Did I say that? I didn't mean it -- I agree, I think two photos from a single camera would do it. And with some tricky lighting, we might be able to get even more out of the system. I am searching for a second laser diode today, so I can do some serious testing with this idea.

vitorio wrote:
daniel_reetz wrote:9. Using a coded aperture camera.
Is this like using light field (plenoptic) cameras?

Yeah, except instead of coding every ray by angle, you modulate the incoming light field in two dimensions. I had a lot more enthusiasm about it before i started trying to implement one, now I think it's unsuitable entirely.

vitorio wrote:
daniel_reetz wrote:10. Using RGB lighting to get the curvature of the book.
This is also neat. Is this made easier by LED lighting? Either fixed color LEDs or programmatic control of RGB LEDs?

[/quote]

You're reading my mind, Vitorio.

))

Which if these is most attractive to you? I'm ready to start working on them right away. Personally, I like the laser line and programmable lighting ones myself.

Tulon · Post by **Tulon** » 05 Jan 2011, 15:25

daniel_reetz wrote: Scan Tailor
(is it still using coupled snakes? )
I know there are other examples of this technique, does anyone remember?

Let put it in context. Scan Tailor does use coupled snakes, but that's a new thing, implemented literally like a week ago. Rob's dewarper (both versions) never used this technique. Anyway, dewarping is not a single algorithm, but a number of different algorithms put together. Coupled snakes are used for deriving accurate and smooth base-lines (or x-lines, mid-lines, etc) of text when you already know where those text lines are located. Compared to other dewarping-related code, this stuff was quite easy to code. The only problem was complete lack of details on this technique from the papers mentioning it.

Post by **daniel_reetz** » 05 Jan 2011, 15:30

Thanks for the clarification, Tulon.

vitorio · Post by **vitorio** » 05 Jan 2011, 16:39

daniel_reetz wrote:Do you have that reference handy? We have people who have offered to liberate papers.

I'll look for it again. I can also get most any papers through my alumni library access, if someone is looking for something in particular.

spamsickle · Post by **spamsickle** » 05 Jan 2011, 20:54

Just a couple of random thoughts on what will probably be my preferred method, because any of us who've built a 2-camera scanner already have all the equipment required: stereoscopic depth map generation.

A lot of the papers written about the technique explore ideas we probably won't be using, like occlusion detection. As I envision it, both cameras would be photographing both pages, and nothing of interest would be hidden in either view. I don't see where we would need to use anything other than parallax, and each letter of text or corner of an image should provide several points to match on.

Looks like some goober is trying to patent stereoscopic depth mapping - http://www.freepatentsonline.com/y2010/0328437.html
Unbelievable...

Most hardware setups seem to have the cameras pointing parallel to each other. This might make the calculations easier, since the distance between the cameras could be fixed by placing them on a rig, but I wonder if one wouldn't get better separation, more data pixels per picture, and thus more precise depth calculations by separating the cameras a little more and pointing them toward the center of the book, similar to what we do with our platen setups. Even if that won't work, the highly constrained environment in which we're trying to extract depth information should enable us to use some optimizations which the real-time battle-bot coders can't. We don't necessarily need absolute depth information - relative depth with some kind of ad hoc scaling capability might give us all the information we need to correctly unwarp a sequence of pages.

Here's a paper exploring GPU execution -- while it's for a scanning electron microscope, and their method of image capture does introduce some occlusion, they're mostly using parallax. http://etd.ohiolink.edu/view.cgi?acc_num=case1263503965

andrewgreendf · Post by **andrewgreendf** » 06 Jan 2011, 02:34

Here are a few more ideas:
- Detection of book/page structure and content from pictures of the pages in motion. This is the essence of the book flipping-scanning proposal. http://singularityhub.com/2010/03/23/bo ... ute-video/ Although the prototype shown uses a laser to detect the 3D structure of the book while it's being flipped, I imagine that the algorithm could be perfected to make that unnecessary. Maybe it could also be made to require only a few images of any given page, which might make it possible to use, instead of a high-resolution video camera, a single, consumer camera, which would snap photos every 1/2 second or so while someone turns the pages at a leisurely rate. The underlying principle is the same as stereo imaging: structure from multiple images; it's only different in that, instead of using images separated in space, it uses images separated in time (and considers how things may have changed during the intervals between images).

- Document imaging using many tiny cameras together to get 3D information--like stereo vision on steroids. I once saw a patent application for this, but I can't seem to find it now; maybe someone else can or has seen it.

- A combination of the methods mentioned. This is, I think, how human vision really works: we combine prior knowledge of the objects we're seeing with information from shadows, text baselines and stereo image correspondences, on-the-fly.

Some general comments:
- I would lean towards less hardware and more work done by the software as far as possible. The easier it is to make or obtain a scanner, and the easier that scanner is to carry around, the more it furthers the cause of knowledge and culture, at least in my view.

- If the 3D data and the image used for the actual scan are not taken at the same time, isn't there a danger that the page will move and dewarping will no longer be accuarte?

- I learned almost all I know about stereo imaging (which is not much) from Learning OpenCV: Computer Vision with the OpenCV Library, by Bradsky and Kaehler (O'Reilly, 2008). See especially chapters 11 and 12. (Chapter 13, on machine learning, also looks like it might be relevant here, but I haven't had time to read it...) See page 454 for a nice point cloud made from stereo images using OpenCV.

vitorio · Post by **vitorio** » 06 Jan 2011, 06:40

daniel_reetz wrote:Do you have that reference handy? We have people who have offered to liberate papers.

Document capture using stereo vision, 2004 is the paper I was thinking of. Not that recent after all, but it's been cited since then.

Feature extraction to point clouds made me think of Photosynth; I wonder if the stock Photosynth software could be used to generate a point cloud of a book, and with how many images it'd require.

Video mosaicing for document imaging, 2007 is also interesting. It reminds me of Steve Mann's videoorbits work, novel techniques to rectify and composite images from video.

Using video made me wonder if superresolution and subpixel interpolation techniques could be used to improve the DPI of the camera-based techniques.

daniel_reetz wrote:Which if these is most attractive to you? I'm ready to start working on them right away. Personally, I like the laser line and programmable lighting ones myself.

Same. I think the regular shadow-differencing light tricks would be easiest for others to reproduce over the RGB ones, but I'd expect the math and image processing for one to apply to the other.

And whether you're using stereo pairs or feature extraction or a Kinect, you still end up with a point cloud that you have to turn into a mesh and then straighten out, right?

DIY Book Scanner

Methods To Sense The 3D Surface/Structure Of A Book

Methods To Sense The 3D Surface/Structure Of A Book

Big List Of Methods

Re: Methods To Sense The 3D Surface/Structure Of A Book

Re: Methods To Sense The 3D Surface/Structure Of A Book

Re: Big List Of Methods

Re: Methods To Sense The 3D Surface/Structure Of A Book

Re: Methods To Sense The 3D Surface/Structure Of A Book

Re: Methods To Sense The 3D Surface/Structure Of A Book

Re: Methods To Sense The 3D Surface/Structure Of A Book

Re: Methods To Sense The 3D Surface/Structure Of A Book