A Brief Investigation into the Visual Bioapparatus

rob · Post by **rob** » 08 Apr 2011, 15:38

I recently watched a TED talk by Patricia Kuhl (The linguistic genius of babies) in which she found that if babies aren't exposed to the sounds of a particular language by the time they are about 10 months old, they will have difficulty later in life hearing the subtle sounds of that language. The brain's plasticity with respect to hearing language sounds decreases rapidly after 10 months.

I was also reminded of that cat experiment where a kitten was raised in a room full of vertical stripes, and later on had difficulty seeing where tabletops ended because the cat had difficulty seeing horizontal features. But it was really good at vertical things.

If I were to feed a brain still in the plastic stage images of nothing but black text on a white background, what sort of features would die off, and what sort of features would stay? More to the point, what sort of features are even considered?

I found this paper (Young, 2001, The Gaussian Derivative model for spatial-temporal vision: I. Cortical Model) which posited the following:

1. That the initial features were gaussians and derivatives of gaussians.
2. That the initial features could be stretched to provide feature detection at different scales.
3. That the initial features could be rotated to provide feature detection at different angles.

So I wrote a set of filters which look like this. I used scales of 2, 4, 8, and 16, which lead to the coverage area shown. Pixels inside the coverage area count, and outside they don't matter. In the following graph, the approximate range for the filter is circular around the center, with radius about 8 pixels. Also, pixels farther away from the center count for less.

: gauss2x2.eps.png (89.43 KiB) Viewed 6348 times

The idea behind the filter is that you place it at all points on the image, and total the result of the filter. A black pixel counts as 1, and a white pixel counts as 0. Multiply each pixel by the value of the filter, determine what the maximum possible value of the filter would be (if all pixels in the coverage area were black), and if the value of the filter exceeds half that maximum, we have a "hit" where the filter registers a positive result.

Here are filters at other scales. The ranges are:

scale 2, radius 8
scale 4, radius 16
scale 8, radius 32
scale 16, radius 64

: gauss4x4.eps.png (105 KiB) Viewed 6348 times

: gauss8x8.eps.png (105.76 KiB) Viewed 6348 times

: gauss16x16.eps.png (106.42 KiB) Viewed 6348 times

Filters can be stretched in x or y:

: gauss2x8.eps.png (93.3 KiB) Viewed 6348 times

They can also have different angles:

: gauss2x8a4rad.eps.png (92.63 KiB) Viewed 6348 times

Young found that changing the shape of the filter by taking its derivative in x (before changing the angle) matched the filters found in actual visual neurons. By taking the derivative, we also introduce negative regions, where pixels in that region count against the value of the filter. Here are graphs for the first, second, third, and fourth derivative filters.

: gauss1x4x4.eps.png (94.38 KiB) Viewed 6348 times

: gauss2x4x4.eps.png (100.03 KiB) Viewed 6348 times

: gauss3x4x4.eps.png (104.09 KiB) Viewed 6348 times

: gauss4x4x4.eps.png (113.24 KiB) Viewed 6348 times

You can see that a gaussian filter of derivative N (0, 1, 2, 3, 4) has N+1 coverage areas.

Young also said that the 1st-degree gaussian corresponded to edge detectors, and the 2nd-degree gaussian corresponded to bar detectors (although it seems to me to be a negative bar detector). I'm also calling the 0-degree gaussian a "blob detector".

So I made up a set of filters. Each filter had a derivative number (0, 1, 2, 3, or 4), an x-scale and y-scale number (2, 4, 8, 16), and an angle (64 angles equally spaced around 360 degrees), for a total of 5 x 4 x 4 x 64 = 5120 filters. Then I took a single page of text, ran all the filters against it, and sorted the filters by their number of hits from most to least. The run took 16 hours to complete. What did I find?

The most commonly activated filter was the 0-derivative 2x2 blob detector. The next most common was the 0-derivative 2x4 blob detector at pretty much any angle, with some angles being slightly better than others -- but not significantly so.

Next was the 0-derivative 4x4 blob detector, followed by the 0-derivative 2x8 blob detectors, then the 4x8 detectors, then a few 2x16 detectors at various angles very near horizontal and vertical.

Next came some 1-derivative 4x2 detectors. These have positive and negative regions along the axis of the detector, and is twice as long in that axis. The angles were specifically from 129 to 157 degrees, and from 309 to 331 degrees.

And so on.

Some filters were never activated: Most of the 2nd, 3rd, and 4th derivative filters that were 2x8 or larger were never activated.

How many filters out of the 5120 were activated at least 5% of the time (taking the most common filter as 100%)? 2510 filters, the last of these being the 4th-derivative 8x8 blob.

rob · Post by **rob** » 08 Apr 2011, 21:22

Some examples of the image after filtering. Here is the image after filtering by the 2x16 "blob" detector at zero degrees (which is the same as a 16x2 blob detector at 90 degrees). Remember, this is a binary image consisting only of those points where the filter gets a high enough value.

: Image after 2x16 blob detector; preface-0-2-16-0.png (127.49 KiB) Viewed 6334 times

Can you read it? You can probably make out a lot of words. Here's a close-up:

: preface-0-2-16-0-c.png (17.01 KiB) Viewed 6334 times

You might be able to make out some highly distinct words, such as "Another". What about the first word on the sixth line? The reason you probably can't make out as much in the second image as in the first image is probably scale. It's likely that the detectors in your visual bioapparatus which recognize letters and words can't process this and fill in the missing information at this scale. Perhaps Daniel will weigh in here; he's studied visual neuroanatomy.

Here's the image for the 1st-derivative 4x2 at 145 degrees. It's sort of an edge detector, picking out mainly the right and top edges of lines.

: preface-1-4-2-145-c.png (29.37 KiB) Viewed 6334 times

The same thing, but 2nd derivative. Only bits and pieces of each letter are picked out -- this filter picks out 17% of the pixels that the simple 2x2 blob detector does. Are these useful features that could be combined with other features to recognize letters?

: preface-2-4-2-145-c.png (19.63 KiB) Viewed 6334 times

For comparison, the same filter but at an angle of 80 degrees. This one picks out 5% of the pixels. Interesting features?

: preface-2-4-2-80-c.png (29.79 KiB) Viewed 6334 times

Now, clearly something that picks out 100% of pixels may not be very useful as a feature detector. After all, if all it does is say, "Yes, there's a pixel here, and a few next to it," it's probably not very useful. On the other hand, filters that hardly pick out anything should also be equally useless. So feature filters that respond too often or not often enough would be killed off according to the "linguistic genius babies" and cat experiments. But what are the thresholds for killing off feature filters? Perhaps a filter that doesn't go off much (for example, 5% of the time) is indicating something really interesting, but one that goes off, say, 20% of the time isn't of much use.

And, how do these feature filters combine at higher levels to recognize larger shapes?

rob · Post by **rob** » 12 Apr 2011, 16:56

Cross-correlations! This is a fun statistical technique (inasmuch as anything statistical can actually be said to be fun) used to determine how strongly or weakly independent two sets of data are to each other. If one set is "times the ground is wet" and the other set is "times it rained", you would expect these two sets to match nearly all the time, so their cross-correlation would be nearly 1. If, on the other hand, one set is "times the ground is dry" and the other set is "times it rained", you would also expect these two sets to match opposites nearly all the time, so their cross-correlation would be nearly -1. Only a correlation near zero tells you that the two sets are essentially unrelated to each other, as for example the sets "times the ground is dry" and "the price of tea in China".

Then there is auto-correlation, which is just cross-correlation applied to the same set. Suppose your set was "times it was light outside". An auto-correlation would show that your set had a high positive correlation 24 hours apart, -24 hours apart, 48 hours apart, -48 hours apart and so on, because of the cyclical nature of outside light. There would also be a high negative correlation 12 hours apart, -12 hours apart, 36 hours apart, -36 hours apart, and so on, corresponding to the dark part of the cycle.

Now, suppose we had a binary image of the sort that comes out of our filters, where a pixel is either on or off, and we ran an auto-correlation on the image. This means taking every possible coordinate shift and comparing the two images to see where dots coincide. Of course, at a shift of (0,0) all the dots will coincide, and so the correlation is exactly 1. At larger shifts, one would expect fewer and fewer dots to line up, and so the correlation would rapidly drop to zero. We would not expect negative correlations, because that would imply white dots in one image wherever there were black dots in the other image, and black dots in one image wherever there were white dots in the other image, and that just doesn't happen.

Using the 2x2 blob detector (whose result looks hardly different from the original image), an autocorrelation shows the following, where white is 1 and black is 0, and the center of the image corresponds to a shift of (0,0):

: autocorr-2x2.png (97.63 KiB) Viewed 6307 times

The horizontal bands indicate that the image is correlated with itself for vertical shifts (i.e. the image has horizontal lines at regularly spaced intervals). There are certainly other things you can take away from the autocorrelation, but one important thing is this: 2x2 blobs are never as highly correlated to other 2x2 blobs as they are at a shift of (0,0). As you shift the image, the chances of encountering another 2x2 blob is reduced. The vast amount of correlations is under 0.25.

What if we correlate a very rare feature with a different very rare feature? Any correlation would be interesting. Let's look at, for example, Such as the 1st derivative 281 degree 2x16 edge detector (activated 0.3% of the time) with the 1st derivative 67 degree 8x16 edge detector (activated 0.4% of the time)?

Here are the two filter results. They look kind of like scratches in the dirt. Very disorganized.

And in fact the correlation is pretty much zero everywhere -- the highest correlation is 0.04, which is barely anything.

Now, for the most part, we are only interested in relatively local correlations. Correlations between items halfway across the page aren't really useful to us at this point. I'm more interested in correlations showing letters and shapes being formed, and those are necessarily local. I'm defining the local area as a circle of radius 128, which is twice as large as the largest feature detector being compared. Anything inside that circle is local, and anything outside is not relevant.

How about two filters that activate 10% of the time, and are close in nature: 1st derivative 4x8 filters, one at 298 degrees and one at 118 degrees -- 180 degrees apart?

: corr-180.png (6.65 KiB) Viewed 6307 times

This makes sense: these filters roughly pick out the tops and bottoms of letters, and so one would expect that where there is a top of a letter, there is also a bottom.

But more importantly, how should I search the 5120 filters to find correlations between each other? There are 5119 + 5118 + ... + 1 such pairs, or (approximately) 5120 * 2560 = 13 million such pairs. Running 13 million correlations is not quick. Not even close.

spamsickle · Post by **spamsickle** » 12 Apr 2011, 19:49

Back when I first became interested in the topic of computer vision, I picked up a book by David Marr (now deceased, I think), called "Vision: A Computational Investigation into the Human Representation and Processing of Visual Information". He made the case that the human visual system (at least the earlier parts, where rods and cones are firing off signals, and inhibiting signals from firing around them) could be modeled by a "difference of gaussians". If I'm correctly understanding 1/10th of what you're saying here, it seems that there are lots of other interesting filters that may be playing between eye and brain.

Have you looked into Cuda and cloud processing for determining your correlations? It seems like the sort of thing that could be written to take advantage of parallel processing power.

Do you think you might find some efficient ways to distinguish between text, greyscale/color images, binary images, text on colored background, colored text vs "pure palette" charts and illustrations, etc.?

rob · Post by **rob** » 13 Apr 2011, 10:03

Difference of gaussians is the same as derivatives of gaussians, there is just slightly less computation involved -- not enough to matter in my case. The biggest problem is that most of these computations are one-off. I've tried Cuda (well, OpenCL anyway) and found it to be barely faster than running on the CPU, because the GPU is slower than the CPU. I calculate that 13 million correlations will take my laptop about four years to crunch through, so unless I come up with some other idea, not even cloud computing will save me -- I used Amazon's EC2 compute cloud for genetic programming a few months back, and I was able to get 64 processes running at a time. But I still need to reduce four years to something more like one day, which means I need to wait for computers to get about 100 times faster (or 100 times more CPU dense, or combinations thereof). That's about 10 years from now... which is plenty of time to find other, more efficient ideas.

So, computational neuroscientists... no pressure!

DIY Book Scanner

A Brief Investigation into the Visual Bioapparatus

A Brief Investigation into the Visual Bioapparatus

Re: A Brief Investigation into the Visual Bioapparatus

Re: A Brief Investigation into the Visual Bioapparatus

Re: A Brief Investigation into the Visual Bioapparatus

Re: A Brief Investigation into the Visual Bioapparatus