Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I like Andrej and I think he's a brilliant guy. At the end of the day though, there is no way for me or anyone to audit the neural networks or their training data. How can anyone trust them? I think this is really a fundamental problem with using neural networks in self-driving cars--I don't want to put my life in the hands of a fragile system that I, and no one alive, can understand. We are being asked to close our eyes, take a leap of faith, and give complete trust to this system, and just hope it's been trained on every edge case under the sun and nothing out of the ordinary will ever happen while we're in the car.


> I don't want to put my life in the hands of a fragile system that I, and no one alive, can understand. [...] just hope nothing out of the ordinary will ever happen while we're in the car.

This honestly sounds pretty similar to the current state of affairs.

Currently, there are 0 people that fully understand the human brain. If you're driving, and someone turns in front of you suddenly, you can ask "why did they do that", but neither you nor anyone else can truly understand the set of inputs and thoughts that went into that decision at the time. Even if the other driver says "I guess I didn't see you", they also don't understand their own brain. It's probably a post-hoc rationalization.

The argument you have applies just as well to letting humans drive as letting opaque neural networks drive... better in fact since we have a better understanding of neural nets than we do of brains.

I think this is a fundamental problem with driving cars, and it's exactly this reason that I would rather take trains and subways exclusively than drive. Even if the operator's brain has a weird malfunction on a train, we have well understood systems (tracks) that prevent them from turning into traffic.


The main difference is you have knowledge of how your own brain operates in reality, what kind of actions you might take in a given situation (and by extension the same can be said of other humans), whereas most (if not all?) people have no clue how neural nets operate, so all bets are off when it comes to predicting what the computer could/will do.

Contrast that with a human: you know the subset of possible actions a human would likely take. Even when you see a car swerving and acting like the driver is drunk, you have a mental model of the set of actions to expect from such a car (you might add more distance between you and that driver for example).


How can anyone trust any other complex system? Even something as seemingly trivial as a bridge. Picture driving on an unfamiliar road when suddenly there's a bridge ahead. How can anyone in that situation trust that bridge? Without knowing how it was built, whether it is in a state of disrepair, or whether the two sides of the bridge connect without a precipice inbetween[1]. The answer is that we don't need to completely trust a system as long as our experience suggests reasonable safety. Once unassisted Tesla proves to be safer than human drivers, we still won't trust it, but that won't prevent us from using it.

[1]: Not relevant to the above comment, but I once encountered such a bridge. It was while driving in the dark, at about 1am and with no traffic, when the navigation in my, iirc, Android 2.1 phone, suggested I cross an unfinished bridge. The only warning sign ahead of the precipice were a couple of traffic cones. The road was familiar, though, and I drove very carefully, curious about the new bridge.


We trust the government regulators to set standards for bridges, we trust civil engineers to build them properly, we know that bridge building has a long history and established body of knowledge, and we know that bridges rarely fall down. Yes, obviously we do not have all the information to verify the integrity of the bridge in that moment, but we can at least trust that an appropriately trained person somewhere on Earth can fully understand the system and can analyze it to verify its integrity. That's not the case with ML--no one understands it, no one can verify it.


> At the end of the day though, there is no way for me or anyone to audit the neural networks or their training data. How can anyone trust them?

By external measurements and statistics. You measure an ML engine by performance. Literally no one anywhere knows how to debug these things by looking inside the model.

If it's a "fragile system" then it will be shown to be so in data. At the end of the data, either cars crash or they don't. It doesn't matter what kind neurons made the decisions.


How can you measure the performance of an ML engine in untested conditions? Tesla's whole goal is to achieve L5 capabilities. That means anywhere in any conditions, which is clearly impossible to comprehensively validate with a finite test set. It's plausible that we could get a long period of acceptable safety before a black swan event suddenly causes deaths because it can't handle smoke, heavy rain/fog, or blizzards.


From the video, it looks like the team is trying to minimize the set of untested conditions through simulation and clips from the fleet.

I agree with you that those techniques won't eliminate errors, but I don't think that's the requirement for L5 - human drivers' errors cause lots of deaths today.


That's just ludditism. That logic works for any new technology. How do you know vaccines are safe under all conditions? How do you know planes won't crash under some "black swan" event? How do you know GM food won't poison you? How do you know your fuel tank won't explode? How do you know your ocean liner won't hit an iceberg and sink? There's literally no technology in the world that meets your standards.

You don't know. You measure and you decide, based on numbers and science and moral reasoning about risk. You don't get absolutes from anything else, why are you demanding it from cars?


I'm not sure what in my post gave you the impression that I'm demanding perfection, but it's incorrect. My interpretation of the GP's point is that we know these kinds of perception models have very real failure cases in the presence of noise and adversarial images in the environment, among other possible operating conditions. They're asking what safeguards exist to mitigate the risk of a system failure, which is something other SDC companies have put enormous amounts of work into.

It's a perfectly fair question and the general expectation of things we have as a society is that things generally shouldn't fail catastrophically at random. In a plane, that's achieved to socially acceptable levels by an incredible amount of systems engineering, redundancy, aerodynamics, and ultimately well trained pilots & ATC. With food it's achieved by government oversight, standards, and the general principle that most things won't harm you. With ships, we simply have them avoid icebergs.

What isn't a good answer is just handwaving about measuring model performance. That's the smallest of the many prerequisites to safety here.


I'm saving this to try and explain why you or I understanding 'how' or 'why' a model is working is irrelevant.


It just has to be better than humans.

That's why I think Tesla's approach is wrong. It tries to replicate a human driver. The argument against lidar is "we don't need a lidar to drive, our eyes and brain are enough". Yes, enough to kill thousands of people. You can easily find pictures of involuntarily camouflaged obstacles that take way too long for us to notice, and as expected, vision based system fail just like us, but lidar have no problem.

It is hard for computers to be as good as humans when it comes to vision. That's something our brain is exceptionally good at, and a large part of it is dedicated to it. But here, for true self driving cars to be a reality, they need to be way better than we are. It means that we should throw every advantage machines can have at the problem, and extra sensors are part of it, don't be limited to visible light cameras. lidar, radar, sonar, IR, inter-car communication, satellite maps, use everything.

And BTW, focusing on seeing roads is nice, but don't forget that there is more to driving that seeing roads. People communicate with their cars, sometimes in subtle ways, current self driving tech doesn't, and that's a big reason why they feel so alien and other drivers hate them. Self driving cars should communicate too, and better than humans.


It doesn't need to be better than humans- slightly worse, but doesn't get drunk or tired still saves thousands of lives. Slightly worse but never drives crazy or gets distracted by someone in the passenger seat. Worse but is never surreptitiously texting.


Compare it to a horse. No one knows what's going on in the horse's brain. However, we learned to trust it and how to deal with it safely. E.g., blinkers (the devices to limit a horse's peripheral view).


Horse collisions didn't usually end up in death. Horses also weren't sprinting at 65 mph through 8 lane highways.


Horse accidents are quite scary things. Way more deaths per mile than cars I'm sure.


That’s an amazing analogy.


> We are being asked to close our eyes, take a leap of faith

Actually, you're being asked to "Always watch the road in front of you and be prepared to take corrective action at all times. Failure to do so can result in serious injury or death.", until FSD gets released :)

Do you trust cab drivers to "be trained on every edge case under the sun"? Self driving only needs to beat humans, who have some serious flaws, not achieve perfection.


If all airplanes needed to do is "be safer than cars" then no one would ever fly.


> Self driving only needs to beat humans, who have some serious flaws, not achieve perfection.

In particular, though, it needs to beat good human drivers, not average ones.


I would argue the average human driver is actually a good driver. People argue that Tesla training their system to mimic a human driver is a bad idea because then it will replicate bad human driver behavior, but the fact is that human drivers don’t end up killing themselves or others the vast, vast majority of the time. George Hotz of Comma.ai brought up an interesting point about this on one of his Lex Fridman podcast appearances. He said from the driving data they’ve gathered, it seems that average drivers are good in the same ways, while especially bad drivers are bad in unique ways. The result is that the bad driver behavior in the data is pretty much overwhelmed and washed out by the abundance of good driver behavior in the data. Add to this the fact that self-driving systems still have things like automatic emergency braking, and I’d say you’ve got a pretty safe driver.


It depends. If it is equal to the average human driver then being on the road would be a benefit because below average drivers (drunk, elderly, tired, etc) could use FSD while above average drivers drove themselves.


How can you know that it beats humans? You would have to drive tens of trillions of miles to have any worthwhile data on this, all the while putting humans at risk.


Why do you need tens of trillions of miles?


Would QA solve that problem? As in, "we replayed 1.4 trillion accident scenarios and our AI behaved correctly on all of them". Safety in numbers.


"We trained a model, overfit on 1.4 trillion accident scenarios, and if behaved correctly on all of them."


Unless your model actually has trillions of parameters (and it doesn't, even gpt-3 only has 175 billion) it is not even possible to overfit on 1.4 trillion training inputs. You can't actually pigeonhole it.


Suppose that you train a neural network to predict the next number in an arithmetic sequence (a, a+b, a+2b, a+3b, a+4b, ...). As input it gets two numbers, the last number and the current number and has to predict the next one.

Suppose you had 1.4 trillion examples in the following test set (using a model with 175 billion parameters):

(1,2)->3

(2,3)->4

(3,4)->5

...

Do you think it is possible to overfit and score perfect on the test set, while failing to generalize?


I think you've specified this problem in a very strange way. But if you're saying that you're trying to train on the specific dataset where a = 1 and b = 1, then your model will fit the data perfectly with 175 billion parameters. It will also fit the data perfectly with, like, 15 parameters.

If you're trying to fit to some more complex space where a and b are unknown and you're given 3 numbers in the sequence, then what you're trying to fit is `f(a, b) = a + 2(b - a)` (or 2b - a, however you want to represent it), which is a swell function, but if you only give data that can be equally represented by `f(a, b) = b + 1`, you're mis-training your model.

But you could once again do that with a model with a dozen parameters. In both cases, the issue isn't overfitting, but misrepresentative data.


I didn't specify the training set, just the test set. It's possible that your model actually models an arithmetic series. Or that it simply overfits. The point is that it doesn't require trillions of parameters to overfit to a trillion-sized test set.


What you need are more parameters than the complexity of the underlying distribution. If you drop to a linear function you're modelling, you only need a couple of parameters.

"Overfitting" is memorizing the training data instead of generalizing. The example you're providing isn't overfitting, it's just generalizing to the wrong function. Overfitting would be if the validation set was, say, 30 random values that you got right, but didn't get other values along the same lines correct.

> I didn't specify the training set, just the test set

Then unless you constructed the training set with the intent of mistraining the model, I think a training set that got good accuracy on that validation set would generalize.

> The point is that it doesn't require trillions of parameters to overfit to a trillion-sized test set.

You can't "overfit" a validation set, unless you've done something wrong. Overfitting is, by definition, learning the training set too well such that you fail to generalize to a validation set.


Overfitting is, by definition, learning a model that doesn't generalize to the distribution of inputs you care about. If your validation set has the same distribution as the inputs you care about, then your definition holds. But that's definitely not true in practice. Usually the data you collect won't be exactly representative of the conditions you're looking to test, unless your problem is very simple.


> Overfitting is, by definition, learning a model that doesn't generalize to the distribution of inputs you care about.

No, that's just mis-modelling. Overfitting is specifically doing so in a way that learns the training data too well, at the cost of generalizing. If you try and have a single layer perception network classify a nonlinear function, it will fail to generalize. But it certainly isn't "overfitting".

Overfitting is not the only form of mistake when training a model. You've presented a different one, which is just like trying to train on misrepresentative data. But that isn't "overfitting", it's just having bad data. Your model isn't "failing to generalize", it has nothing to generalize over.

The classic demonstration of this is that overfitting usually results in a accuracy curve that "frowns" on validation data. Your accuracy peaks, but then decreases as you learn the structure of the test data instead of the general structure. In your example that won't happen.

Training a model in the wrong problem isn't overfitting. In fact, your example is more like underfitting than overfitting. The model in your example would fail to see the full complexity of the structure, instead of as in overfitting, make it more complicated than reality.


That isn't overfit, that's fit. Nothing can protect you if your training set just doesn't have any indication of the thing you want it to learn.


I didn't specify that the training set wasn't representative.

All this shows is that you don't need parameters anywhere close to the number of test examples to overfit.


And my point is that is not what overfit is. Overfit is a specific problem where the network fails to recognize a commonality in the training set and instead interprets the irrelevant details of some subset of training samples (in the extreme, individual samples) as distinct properties.

Your example training set is not filled with noise that the network is picking up on to its detriment. Your example training set is simply not representative of the function you are trying to teach.


I don't have an example training set. I don't have an example model.

My exact point is that if your test set isn't representative of the underlying distribution, then accuracy on the test set doesn't mean that your model isn't overfit.


What if the 1.4 trillion accident scenarios were the test set?


k-fold cross validation


It depends. How many of those scenarios were in fog conditions? How many where a spotlight suddenly blinds some of the cameras? How many where a power line is down, or a manhole cover is missing, or a barn owl/hippo/bear is in the road? If we estimate that it takes 10e5 to 10e6 incidents of a certain type to appropriately train the network for that situation... it's just too difficult. Reality contains infinite edge cases.


It's a long tail that goes on to virtually infinity, yes, but the collective probability of the unhandled edge cases will become vanishingly small, and just an accepted part of living in our reality, which is not perfect and impossible to fully predict and control.


You're not wrong.

But you can say the same things about a human driver.


And each human is unique, making it is really difficult to QA/monitor/rollout updates to the fleet.


Don't understand why people are downvoting this. This is the fundamental truth about safety automation like this. We don't need a perfect system to get huge gains here. Humans are really bad at driving, frankly. The bar here is very low.


I think you're right that the bar of "average human on the road's driving skills" is very low. However, I think one of the reasons there's such an inherent distrust of Tesla's FSD is that when it fails, it fails in scenarios in which people can easily see themselves succeeding.

Say, on average, human drivers crash into barricades once out of every 500,000 miles driven. And say Tesla's FSD can beat that number by crashing into barricades once out of every 1,000,000 miles driven, it's still possible (probable?) that many people would trust their own control of the vehicle more than Tesla's FSD AI. And they might be justified in doing so!

Why? Because human-caused accidents are not uniformly distributed among the driving population. Insurance companies have employed actuarial analysts for decades to split the driving population up into buckets of greater and lesser risk. Thus, if you're a 17-year-old male, or an 85-year-old woman, or have several DUIs, you're likely to pay higher premiums than most other drivers.

If you're not in one of those high-risk categories though, it's possible your driving performance exceeds the average, and maybe you might only crash into barricades once out of every 2,000,000 miles driven. In that case, you'd be right to trust your skills ahead of the AI.

My example is contrived. It's possible that the FSD AI leapfrogs even the most skilled human drivers. But until it does, there may be rational reasons for many folks to continue distrusting it.


> Because human-caused accidents are not uniformly distributed among the driving population.

This is such a great point. Controlling for driving circumstances (e.g. weather, location, etc.), human caused accidents are not uniformly distributed across the population, but machine caused accidents are.

This, in my mind, is the fundamental reason people are distrustful of self-driving cars. Everyone thinks they’re far better than the median driver, even if this is mathematically impossible.


It's not just about skill. A lot of accidents are under the influence. That's a choice. While you can't control what other drivers are doing (and neither can the Tesla), you can at least significantly increase your odds of safety by some simple choices such as this.


> A lot of accidents are under the influence. That's a choice.

Only for one of the drivers! There are at least two cars involved in almost every fatal accident.

I made this point elsewhere but it bears repeating. Even if you think your driving is so perfect you can't benefit from a self-driving car, don't you want everyone else in one?


> My example is contrived.

Pretty much. I think if you're going to make an argument like "accidents by teenagers, senior citizens and drunks don't matter" you need to put some numbers behind that.

I mean, if you had a teenager or a family member with a substance problem, would you feel safer with them in a Tesla? If so, then I really don't think I understand your point.

Basically you're just saying that you, personally, as a young single man, feel like you're much better at driving than the median driver. So you want a car that is significantly better than even the median before you will "feel safe".

Alternate framing: it doesn't matter how safe you think your driving is, you're still sharing the road with grandparents. And you (if you're being rational about it) want them in a Tesla.


> And you (if you're being rational about it) want them in a Tesla

Right. Rationally, we all want folks who are worse than the state-of-the-art FSD AI to be using FSD. But thinking that the "very low bar" that has to be cleared is drunk drivers misses the point that most drivers are not drunk. So it's much less impressive to a "safe" human driver to clear the low bar.


> But thinking that the "very low bar" that has to be cleared is drunk drivers

I said median, not drunk driver. Existing shipping autopilot is already saving DUI fatalities. This feels like a strawman to me...


> I said median, not drunk driver.

Right, and I originally said:

> I think one of the reasons there's such an inherent distrust of Tesla's FSD is that when it fails, it fails in scenarios in which people can easily see themselves succeeding.

The above-the-median driver is rationally able to see themselves beating Tesla's FSD. But importantly, even below-the-median drivers are likely to think they can beat Tesla's FSD for as long as the mistakes it makes seem naive and trivially-avoided by them. Human trust will require a higher bar than "beats the median human on 5 KPIs, but still on rare occasions decapitates someone because it doesn't know what a semi truck looks like from the side".

FSD advocates seem to want to focus on the "beats the median human" bit. I'm saying it's unlikely they'll gain much public trust until the "can't recognize a sideways semi" stories die down.


That's ridiculous. Humans are quite good at driving, along with many other complex tasks. The fact that we see about 1 death for every 100,000,000 miles is pretty fantastic, especially when you consider how heavily that is tilted towards impaired drivers.

Exclude the worst 1% of drivers and the average goes up considerably.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: