Trig and Self-Driving Cars

What's the point of high school math? Part 7

Dec 22, 2023

This is the seventh installment in the “What’s the point of high school math?” series, where I walk through the process of programming a self-driving car to “see” and navigate its environment and discuss how high school math is relevant. We’ve now broken down all the steps needed for a self-driving car to use sensors to navigate:

Part 1 introduces the terminology that computer vision engineers use to talk about such a problem.
Part 2 covers statistics
Part 3 talks about algebra
Part 4 brought us to geometry
Part 5 covered miscellaneous AP classes.
Part 6 talks about how to combine all these things into a navigation system that can predict where the car is going to go next.

This is all useful, but it ignores one key fact about driving cars - there are other cars on the road, not to mention bikes, people, animals, and emergency vehicles. That means that it is not enough to be able to predict where we are going, we also need to perform hazard detection to see if there is anything in our way.

Obviously, if we wanted to use cameras to perform hazard detection for self-driving cars in real life, we would attach the cameras to the cars. However, this complicates the math, so pretend that we have a drone flying slightly ahead of our self-driving car with a camera pointed straight down at the road, like so:

We can take pictures as the car moves, then perform traditional computer vision or more advanced machine learning algorithms to find hazards in the image. However, this is not helpful to us unless we know how big the hazard actually is. I don’t want the car to swerve dangerously to avoid a pebble, but I also don’t want it to plow right over a pedestrian. In order to accomplish that, we need to relate the picture we took to the real world.

The diagram above may look familiar to those who recently took trigonometry - it is two right triangles stacked together. If you went to high school in America, you were probably taught the mnemonic SOH-CAH-TOH, which corresponds to the equations:

\(\sin{\theta} = \frac{opposite}{hypotenuse} \)

\(\cos{\theta} = \frac{adjacent}{hypotenuse}\)

\(\tan{\theta} = \frac{opposite} {adjacent}\)

for the right triangle:

Where “right triangle” means that one of the sides is 90 degrees. Personally, I do not find this to be particularly helpful - it just seems like random equation, and couldn’t I just draw another triangle with different values? Why would there be a relationship between the triangle’s angles and its sides? I find it a lot easier to understand these rules when instead of an abstract triangle, I picture something real. Imagine that you’ve been locked out of your house, and you know the second floor window is unlocked. You have an extendable ladder, and you can put it close to the window and climb up. Think of the shape that would make - there wouldn’t be much ground between the wall and the end of the ladder, and there would be a narrow angle between the ladder and the wall. What if there was a flower bed in the way? Then, you would have to place the ladder at a bigger angle, and there would be more distance between the wall and the end of the ladder. Plus, you would have to extend the ladder in order to reach the ground. The intuition behind the ladder problem can help us with our hazard detection problem.

We have our camera pointing down at the ground, and like we discussed in Part 1, that camera has a particular field of view that tells us how much of the real world the camera can capture. Let’s say we also know how high up our drone is. If we think of the area covered by the field of view as two right triangles, we can use SOH-CAH-TOA to figure out how much ground a picture is covering:

\(\frac{0.5 * Ground}{Height} = \tan{(0.5 * FOV)}\)

\(Ground = 2 * Height * \tan{(0.5 * FOV)}\)

(The 0.5s and 2s are there because the single camera image forms 2 right triangles). Now, we can again hearken back to Part 1 and calculate the ground square distance for this image will be the total ground area divided by the number of pixels in the image. This is important because now we know how much space on the ground a single pixel represents. Now our hazard detection algorithms are useful! Any hazards we find can now be sent to the navigation filter we talked about in Part 6, or a more complicated control algorithm, to tell the car to avoid them. Now we have a fully functioning navigation system for a self-driving car!

Science for the Unscientific

Discussion about this post