How do self-driving cars recognize cyclists, pedestrians and other vulnerable road users? How do they all interact on real-life city roads? Megan checks in with Argo AI’s Vice President of Autonomy, Peter Carr, to learn how the technology actually works. 

Listen On
Apple
Google
Spotify
iHeart Radio
Share

View Episode Transcript

Episode Transcript

Megan Harris

Hey everyone. This is No Parking, the podcast that cuts through the hype around self-driving vehicles and artificial intelligence. I’m the producer of the show, Megan Harris.

Now, the main feed of your podcast has always, and will always be fueled by those honest conversations with our hosts, Alex Roy and Bryan Salesky. But this season you’ll hear from me in the margins to break down some of the complexities and the nuance around the technology itself.

Today, I’m joined by Peter Carr. He’s the director of autonomy at Argo AI and a lot of his work has focused on computer vision and machine learning. Peter, thanks so much for joining us.

Peter Carr

My pleasure. Thanks very much for having me.

Megan Harris

For a lot of us, those sound more like buzzwords than real concepts. Break it down for us. What does “vision” mean when we’re talking about a self-driving car?

Peter Carr

Computer vision. It’s really the whole field of getting computers to understand the contents of the images that they’re looking at. So that’s recognizing the objects and then a big part too, especially for robotics, is figuring out the 3D spatial locations of everything. You get to see on the image plane, but at the end of the day we care about where things are in the real world.

Megan Harris

So in that vein, there’s this term that autonomous vehicle companies use a lot called “VRU” that stands for a vulnerable road user. That could be anyone on the road, as I understand it, that’s moving slower than a vehicle? Is there a hierarchy, VRUs that are more at risk than others?

Peter Carr

At Argo, we divide all the objects on the road. There are those that are people within a steel safety cage, and basically anybody who’s not within that is a vulnerable road user. That can be riding in the back of a pickup truck and it could be the equivalent to a cyclist or even somebody jogging down the road. As well as, you know, animals and pretty much anything in the world, if it’s moving and it’s capable of moving, you really do want to consider that you have to give extra margin and space for it. And that’s kind of, that’s what we’re really describing about with the vulnerable road user category to begin with.

And yeah, there are different categories of it too. In the case of a cyclist, sometimes they’re moving fast enough that they’re in the flow of traffic, but then there are other times, right, with pedestrians jaywalking. They’re never in the flow of traffic but they’re certainly moving in the same space as the vehicles. And so how you, as a driver, have to reason about whether something’s in the flow of traffic or not is sort of one of those core distinctions as well.

Megan Harris

So when a VRU — a cyclist, a pedestrian, a puppy, for example — crosses a test vehicle’s path, what does the vehicle see? How does it see it?

Peter Carr

Right. So we have a variety of sensors on the car. We strongly believe in lidar, so that’s giving us laser measurements and we would get multiple returns on a human being, an animal and so forth, and a cyclist. And that allows us to understand sort of the shape of the object, the extents, as well as figuring out how fast it’s moving and the direction it’s moving in. We also have cameras and that’s what we use to determine the type of object it might be. So it could recognize it as a person or it could recognize it as an animal. It’s entirely plausible too, that it’s an object that the computer hasn’t been trained to recognize. And so we can still, in those cases, treat it as a vulnerable road user to begin with.

I think the other distinction too is there’s lots of people moving on hoverboards, skateboards, right?

Megan Harris

Oh my gosh, you’re right.

Peter Carr

A cyclist is really our model of fast-moving vulnerable road users. And so we know that there’s a chance they could be in the flow of traffic, but they may not be. So even if you don’t see the bike, it’s easily a person on like hoverboards, skateboards, even running at top speed to be seen in Pittsburgh or jogging down the side of the road at top speed. It’s not like the world is simply cyclists and pedestrians, but really much more like fast-moving and slow-moving vulnerable road users, and the distinction being when you’re fast-moving, you may also be in the flow of traffic. And so it’s one other level of complexity for the cars to be reasoning about.

Megan Harris

Level of complexity. Is that tech speak for harder?

Peter Carr

That’s one way to put it. Yes.

Megan Harris

You mentioned “training” a computer, how do you train a computer?

Peter Carr

So this comes back to computer vision. It really is a lot like training a toddler at times in terms of saying, this is a person, right? This is a bicyclist on an upright bicycle, on a recumbent bicycle. This is what a dog looks like. This is what a cat looks like.

Megan Harris

So repetition.

Peter Carr

Repetition. And the whole field is really then in this pattern recognition to learn, “Oh, these are the characteristics that make me realize it’s a dog versus a cat versus some other object.”

Megan Harris

But when you’re presenting them these images, these things that they can then label and learn, is that on a human then to make that initial distinction early on?

Peter Carr

At the very beginning, yeah. A lot of it is really human labor intensive to say that these are the core objects in the world. You can start from that. And then once you’ve got this, then a lot of the times too, we switch into a field called active learning, which is where you’re sort of correcting the mistakes that the computer would be making at times and realizing, where is it having trouble identifying the difference between a pedestrian and a mannequin, for instance. They look really, really similar and so the subtlety of distinguishing those two, it takes a lot of examples to make that distinction.

Megan Harris

So can the car do that? Can it tell the difference between a human and a mannequin?

Peter Carr

You can make the distinction, but also too, at the end of the day, there’s so many other contextual factors that go into the decision making. And it’s always the what if. So we’re very conservative in saying, what are all the different ways this scenario could play out? Whether it’s actually a person or whether it is a mannequin, how do you make the right safe decision to have the space and time to assess and see what’s actually going on? Is it moving? And then ultimately decide what the car should be doing.

Megan Harris

So let’s put it into a really common hypothetical, right? So you’ve got a person standing at a crosswalk, at a red light where the test vehicle has just driven up. They’re waiting but the car doesn’t necessarily know for sure, right, whether the person is going to walk across that crosswalk, if they’re going to take off running in another direction, if they’re just standing there playing on their phone. How does the car take everything that it’s seeing, that computer vision, and then react to it? What happens between Step A and Step B?

Peter Carr

The distinction here is really on when you’re predicting the future, you’re not just predicting the one guess or the one realm of possibilities that could happen. You’re really giving all the things that could happen. And so in the case of a person near the curb, one option of course, is to stay there. Another option is to step out and could hesitate for a moment. It could be running. It could be a cyclist at speed on the sidewalk. And so the goal of the prediction system is to give the planner a sense of what are all the range of possibilities to be considering and to be thinking about. The evidence here too, if the light is red, the pedestrians probably staying on the sidewalk. As soon as that light turns green, you might not see the motion of the pedestrian actually intending to start crossing, but the change in the circumstances means there is a chance that they’re about to start moving to begin with. So that’s a lot of the contextual reasoning that goes into understanding, do we expect this person step onto the road or do we expect them to stay where they are and stay put?

Megan Harris

So for anyone who does see an autonomous vehicle, while they’re strolling down a sidewalk or, say walking their dog, what would you have them know about what’s going on inside the vehicle and how it’s likely to respond to them? If indeed, they do come upon a crosswalk and need to walk in front of the car.

Peter Carr

Driving is a social interaction between the drivers in the vehicles, but you can think of the vehicles themselves and then all of the other road users, so in this case too a pedestrian stepping out. And there’s an implicit communication that goes on. And a lot of this tends to be either with the driver in the car, but really it comes down to what I call the body language of the car, which is, is it slowing down as it’s approaching the stop sign? Are you seeing all of the other cues of how the vehicle is behaving to realize that it acknowledges there’s a pedestrian here and it’s planning to yield, it’s planning to come to a stop and wait, and the turn-taking. And so I think, in much the same way for any other pedestrian trying to figure out, is that driver actually going to stop for me or not? It’s the very same sort of communication that has to happen. So not only does the car have to understand that world and understand how to navigate, but it needs to understand how to communicate its intentions to everybody else in the world as well.

Megan Harris

Peter Carr is the director of autonomy at Argo AI. Peter, thanks so much for joining me.

Peter Carr

No problem. Thanks very much, Megan.

Megan Harris

That’s it for this episode of No Parking. If you’re liking the new season, please let us know by reviewing the show on Apple Podcasts, Google, Spotify, and more. You can find us on Twitter at @noparkingpod. 

I’m Megan Harris, the producer of the program. This is the No Parking Podcast.