Microsoft Kinect: infrared depth maps for dewarping?

spamsickle · Post by **spamsickle** » 12 Nov 2010, 09:52

Looks good, and congratulations on swatting those build problems. I've encountered most of that software because Tulon is using it for ST, but he's not using GLUT, so I'll check it out to see what it does sometime this weekend. I really like GIT and some of the tools that have sprung up around it for version control, and not just for source code, but that's another topic.

I didn't do a lot of infrared photography back in the SLR days, but from what I recall we didn't have special IR lenses. It's just that IR, being lower frequency radiation, was refracted differently going through the lens assemblies, so there was a little red dot on the lens to tell you where the IR focus would be after you'd focused the visible light. I don't know what kind of lenses you're trying, or how, but it may just be that all those little Kinect spots are being blurred into disks by the addition of a lens, and the software is no longer getting the information it needs to resolve them. I know when I do super-macro photography by putting a loupe in front of my cell phone camera, I have to "focus" it by getting just the right distance, and that's going to be tricky if you don't see an image in real time.

Post by **daniel_reetz** » 12 Nov 2010, 12:11

Yep, that's exactly right, the IR simply focuses at a different distance (just a bit further than red). The precise depth estimation method of the Kinect is still a mystery to me... time to dig into the patent stream. If it is indeed dependent on finely focused dots, it's going to be problematic for modification. Not yet willing to say impossible, but difficult.

strider1551 · Post by **strider1551** » 12 Nov 2010, 16:32

Saw an article when perusing through reddit, thought I should bring it up here:
http://arstechnica.com/open-source/news ... mpaign=rss

Microsoft is not amused by the open source software community's effort to build its own Kinect drivers. The company says that it doesn't condone reverse engineering and has vowed to use technical and legal measures to prevent unauthorized third parties from repurposing the Kinect camera.

spamsickle · Post by **spamsickle** » 13 Nov 2010, 11:40

That Japanese system is described here.

The camera operates at 500 frames per second, with a resolution of 1280 by 1024 pixels. For each frame, the system alternates between two capture modes. First it shines regular light on the page and captures text and images. Then a laser device projects lines on the page and the camera captures that as well.
The scanned pages are curved and distorted, but the researchers found a way to fix that. The laser pattern allows the system to obtain a page's three-dimensional deformation using active stereo methods. So they wrote software that builds a 3-D model of the page and reconstructs it into a regular, flat shape.

The researcher is trying to develop applications for his high-speed image capture chip. We wouldn't need the speed -- as long as the shape of the page doesn't change between the regular light and IR laser shots, they can be correlated. One paper that discusses using two cameras and "structured light" to resolve depth and lighting disparities is here. I'm a complete novice in this area -- I'd prefer a solution that doesn't require an IR laser, or even "structured light", though I guess I could justify $150 for a Kinect if it could be made to work with an easy mod and turnkey software. Even so, a platen-less system that would require nothing more than a pair of cameras and ambient lighting would be preferable.

It seems to me that the only reason structured light is employed is because the scene in question lacks texture in some areas. I don't see this as a requirement for our application, because those areas which lack texture are presumably areas of blank page, and can essentially be ignored. The only things we need to transform are text (brimming with texture, no pun intended), or image. I guess we could get some distortion artifacts if images in our books are monochromatic and we make bad guesses about depth, but I'm willing to burn that bridge when we come to it. A software application that handled nothing but text would be a good first step, and would handle the majority of the books I'm interested in scanning.

univurshul · Post by **univurshul** » 13 Nov 2010, 11:52

Fundamental hardware placement of both the Kinect + SLR:

Don't both of these devices need to be centered above the subject matter in the same location? How does this work when you have 2 devices fighting for space to survey the same field?

or maybe the kinect shoots at an angle, the camera overhead?

vitorio · Post by **vitorio** » 13 Nov 2010, 16:38

spamsickle wrote:It seems to me that the only reason structured light is employed is because the scene in question lacks texture in some areas. I don't see this as a requirement for our application, because those areas which lack texture are presumably areas of blank page, and can essentially be ignored. The only things we need to transform are text (brimming with texture, no pun intended), or image.

I don't think "any" texture is what they're talking about it, they're talking about a known texture. These scanning systems know what the structured light looks like when projected at a known distance against a flat, blank surface (applying texture), and they compute the differences against that known reference to correct for the distortion.

There's no way to know what a flat version of the curved page you're scanning is supposed to look like, so I believe you still need the structured light. The differences in how the light is rendered tells you how curved the page is and where, which you can then fix.

vitorio · Post by **vitorio** » 13 Nov 2010, 16:42

univurshul wrote:Don't both of these devices need to be centered above the subject matter in the same location? How does this work when you have 2 devices fighting for space to survey the same field?

The Kinect has it's IR camera to the right of the IR projector, and the visible-light camera further offset to the left: http://www.ifixit.com/Teardown/Microsof ... own/4066/2

I think placement can be somewhat arbitrary, as long as it's always consistent and known so it can be accounted for in the processing software.

spamsickle · Post by **spamsickle** » 13 Nov 2010, 18:04

vitorio wrote:
spamsickle wrote:It seems to me that the only reason structured light is employed is because the scene in question lacks texture in some areas. I don't see this as a requirement for our application, because those areas which lack texture are presumably areas of blank page, and can essentially be ignored. The only things we need to transform are text (brimming with texture, no pun intended), or image.
I don't think "any" texture is what they're talking about it, they're talking about a known texture. These scanning systems know what the structured light looks like when projected at a known distance against a flat, blank surface (applying texture), and they compute the differences against that known reference to correct for the distortion.

There's no way to know what a flat version of the curved page you're scanning is supposed to look like, so I believe you still need the structured light. The differences in how the light is rendered tells you how curved the page is and where, which you can then fix.

I think you might need the structured light if you were trying to do this with one camera. With two cameras, especially in the problem space we're discussing (where, for instance, nothing of interest is occluded) I'm betting we probably don't need structured light. I think with properly oriented and calibrated cameras, the difference between the right view and the left view will provide all the depth information we need.

I could be wrong about all of this (just really started looking into it today), but it seems to me that things like the IR rangefinding depend on structured light and one camera, and choose IR only to cut down on environmental noise. With two cameras, it seems more important to get them lined up properly and calibrate them in that configuration. From that point on, as long as the configuration doesn't change, finding the "a" in view 1 and matching it to the "a" in view 2 seems to be all that's needed to recover depth.

Our application is simpler in many ways than the typical depth map application. Most of those I've seen want "real time"; that's something that we don't need at all. On the other hand, they're often able to get by with crude estimates of depth (collision avoidance doesn't need to know exactly how far something is, just that it isn't "too close"), while precision will be more critical for us. And, as I mentioned before, we only care about information on a page, not every pixel in every frame.

There is a library out there called "OpenCV" which seems to have implemented routines which would be useful for building depth maps from stereo views. I want to work on Scan Tailor first, but stereo depth maps are now in the queue too.

Post by **daniel_reetz** » 13 Nov 2010, 18:13

I will continue to hack on the Kinect, but am less and less enthused about its capabilities.

Structured light or no, if we could get reliable dewarping/camera pose estimation at a budget of 1-2 seconds per page, that IS realtime, and really important.

rob · Post by **rob** » 13 Nov 2010, 19:39

Well done! I wonder if Kinect matches the (known) output dot pattern with what it gets from its camera. The smaller the pattern is, the farther away the pattern is.

Have you determined what the closest possible depth is? I think you will probably need to get those infrared goggles to see what the pattern looks like with lenses.

DIY Book Scanner

Microsoft Kinect: infrared depth maps for dewarping?

Re: Microsoft Kinect: infrared depth maps for dewarping?

Re: Microsoft Kinect: infrared depth maps for dewarping?

Re: Microsoft Kinect: infrared depth maps for dewarping?

Re: Microsoft Kinect: infrared depth maps for dewarping?

Re: Microsoft Kinect: infrared depth maps for dewarping?

Re: Microsoft Kinect: infrared depth maps for dewarping?

Re: Microsoft Kinect: infrared depth maps for dewarping?

Re: Microsoft Kinect: infrared depth maps for dewarping?

Re: Microsoft Kinect: infrared depth maps for dewarping?

Re: Microsoft Kinect: infrared depth maps for dewarping?