Algebra and Self-Driving Cars

What's the point of high school math? Part 3

Oct 27, 2023

Welcome back to Part 3 of “What’s the point of high school math”? Part 1 reviewed how computer vision engineers think about cameras, and Part 2 introduced the concept of terrain relative navigation (TRN), or the ability of computers to navigate by comparing real-time photos to photos in a database. Part 2 discussed the first step of TRN - matching up parts of the photo with pictures in our database. Part 3 is going to discuss how algebra can be used to figure out which of these matches, or “landmarks”, are correct.

So, let’s return to our photo of Penny. Let’s say that we picked out the three landmarks below in our overexposed real-time photo and matched them to sections of our database photo. Clearly, humans can see that the two green landmarks matched correctly, while the red landmark is a mistake. But how can we get the computer to recognize this?

The key insight here that allows us to solve this problem is that the landmarks are not independent of each other. All the landmarks in our real-time photo appeared in the same picture, so they all need to be consistent with each other. It doesn’t make sense to say “To get landmarks A and B in the database picture, we shifted 2 pixels to the right, and to get landmark C, we shifted 5 up and 10 to the left”. In mathematical terms, all landmarks must lie on the same plane. The algorithm that does this is caused RANdom SAmpling Consesnus (RANSAC). Again, for pictures, this algorithm is performed on a plane, but that is difficult to visualize in 2D, so the example below is going to use points fit to a line y = mx + b.

Let’s start with some points in 2D space:

We want to figure out the line that best fits all of these points. So, our first step is to pick 3 random points and use them to fit a line y = mx + b.

Clearly, humans can see that this line does not match very well, because the line is not very close to most of the points. We need to put that in mathematical terms so the computer can understand that. We can do that set a standard deviation that represents how close we want the line to be to a point to be considered a good match, then count how many points are within those lines. Points within the standard deviation are inliers, and points outside it are outliers. You can think of the standard deviation here as a “fudge factor” that can account for noise or errors in calculation.

In the example above, there is only one inlier. So, we can try again:

And get 4 outliers. We can repeat this for a given number of times, and keep the line that gives us the most inliers. To figure out how many times to run the algorithm, we can use the equation:

\(N = \frac{\log(1 - p)}{\log(1 - (1 - e)^{s})}\)

where

N is the number of iterations to run
e is the probability that any given point is an outlier
s is the number of points we are using to fit our line
and p is our desired probability that we get at least one set of points that are all inliers.

This part may be better classified as statistics, but it is still very useful. We can figure out how many points we expect to be inliers by testing our algorithm on realistic data. Then, we can use that equation to examine the tradeoffs between time and accuracy. The more likely a point is an outlier, the more likely we are to need to run RANSAC more times to get a good line. In addition, the more points we use to make the line, the more likely we are to get an outlier. The picture below demonstrates this relationship. The X axis shows the percent chance that any given point is an outlier. The Y axis shows how many rounds of RANSAC are needed to have 99% confidence that you have selected all inliers. The colors show results for different number of samples.

RANSAC is useful not only because it tells us which landmark matches are inaccurate, but also because it can tell us the relationship between our real-time photo and the database photos. The difference between our real-time photo and the photo we expect to see tells us how wrong our navigation estimate is. Putting this into mathematical terms allows us to update our navigation filter, and get a better idea of where we are.

Science for the Unscientific

Discussion about this post