I’ve recently noticed some possibly important confusion about machine learning (ML)/deep learning. I’m quite uncertain how much harm the confusion will cause.
On MIRI’s Intelligent Agent Foundations Forum:
If you don’t do cognitive reductions, you will put your confusion in boxes and hide the actual problem. … E.g. if neural networks are used to predict math, then the confusion about how to do logical uncertainty is placed in the black box of “what this neural net learns to do”
Imagine a future inmate asking why he was denied parole, and the answer being “nobody knows and it’s impossible to find out even in principle” … (DeepMind employs a Go master to help explain AlphaGo’s decisions back to its own programmers, which is probably a metaphor for something)
A possibly related confusion, from a conversation that I observed recently: philosophers have tried to understand how concepts work for centuries, but have made little progress; therefore deep learning isn’t very close to human-level AGI.
I’m unsure whether any of the claims I’m criticizing reflect actually mistaken beliefs, or whether they’re just communicated carelessly. I’m confident that at least some people at MIRI are wise enough to avoid this confusion . I’ve omitted some ensuing clarifications from my description of the deep learning conversation – maybe if I remembered those sufficiently well, I’d see that I was reacting to a straw man of that discussion. But it seems likely that some people were misled by at least the SlateStarCodex comment.
There’s an important truth that people refer to when they say that neural nets (and machine learning techniques in general) are opaque. But that truth gets seriously obscured when rephrased as “black box” or “impossible to find out even in principle”.
The difficulty of understanding the behavior of a fully trained neural net is mostly due to the complexity of the information that it encodes. The rest of the difficulty is a byproduct of the fact that designers often focus on goals that are unrelated to understandibility.
This is similar to what happens with the tax code, and with large software projects whose creators are careless about maintainability.
It’s generally straightforward to show that a choice made by a neural net can be explained “clearly”, by something of the form:
0.79316*x + 0.020733*(0.85298*y + 0.00555*x + 0.24759*z) + … [continued for 20 or 1000 more terms]
It’s also possible in principle (but often not in practice) to trace those criteria back to human choices about what training examples to give the neural net .
For the inmate who was denied parole by a neural net, it should be pretty straightforward to argue that everything which influenced the decision looked like a somewhat sensible factor to consider.
Whereas with the current system, the inmate ought to be told there’s nearly a 50% chance that he was denied parole due to the time that elapsed since the judge’s last meal. Evidence of this problem doesn’t seem to have caused many calls to abolish the current system.
That’s not to say we can be confident the neural net was fair. Maybe it will predict that orange-haired people  have a higher recidivism rate than bald people. Yes, that could indicate a subtle bias in the choice of examples used to train it. Or it could indicate the neural net found a valuable indicator of recidivism. Distinguishing between those hypotheses require more knowledge of the human mind than current AIs can manage.
Something feels wrong about expecting ML research to resolve controversies about that, but it’s unclear who would handle it better.
The message I got from playing with neural nets (mostly in the early 90s) tells me that neural net behavior reflects how humans implement concepts.
People have trouble accepting the kinds of explanations that ML provides, because they have multiple goals for what an explanation should accomplish, chosen with inadequate regard to how feasible they are. I suspect those goals are usually some combination of these (where X could be the concept of a chair, the decision to parole an inmate, etc.):
- Translating X into language typically used by philosophers or in ordinary conversation, without losing any information.
- Expressing X in a way that avoids arbitrary-looking numbers.
- Expressing X in a way that fits within human working memory.
- Expressing X in a way that separates out the understanding that matters for context Y while hiding all the complexity that got added to handle more obscure contexts.
- Understanding X in a way that fully reflects the human intentions that influence whether an object is classified as X
Some of these are at least as hard as creating human-level AGI. Possibly harder. Only the last one looks like a feature which is both missing from current ML models and possibly important for human-level AGI.
We imagine we can get something nicer-looking because for any particular decision (should X be paroled? is Y a chair?), we have lots of context that helps us dismiss some considerations as irrelevant, and because we have sophisticated models of human intentions to guide our choices of what’s relevant.
I’m fairly confident that concepts are sufficiently solved to not be a major obstacle to ML progress. But my knowledge of ML is insufficient for me to say how close they are to combining concepts as flexibly as humans do (most likely that is still a hard problem).
 I can imagine that MIRI’s agent foundations agenda is partly motivated by confusion over how mysterious ML techniques are. However, I suspect MIRI people have a coherent, but not very compelling, claim that is somewhat distinct from the target of this blog post.
 There’s a dramatic gender imbalance in Google image search results for “orange hair”. That says something bad about either Google’s software or its users.