The paper When Will AI Exceed Human Performance? Evidence from AI Experts reports ML researchers expect AI will create a 5% chance of “Extremely bad (e.g. human extinction)” consequences, yet they’re quite divided over whether that implies it’s an important problem to work on.
Slate Star Codex expresses confusion about and/or disapproval of (a slightly different manifestation of) this apparent paradox. It’s a pretty clear sign that something is suboptimal.
Here are some conjectures (not designed to be at all mutually exclusive).
Hypothesis 0: The “extremely bad” consequences aren’t nearly as bad as human extinction.
Maybe they were thinking mostly of “Trump becomes president” or “AI researcher jobs become obsolete” levels of badness, not “universe converted to paperclips”? I can believe that some people are sufficiently focused on near-term effects of robocars that they simply ignored the part about “(e.g. human extinction)”. Maybe this explains 10 or 20% of the paradox?
Hypothesis 1: Optimism that risks are easy to deal with.
After all, we’ve got a pretty good track record of handling problems with new technologies that we see coming. We’ve even managed it with nuclear weapons, somehow. [1]
Maybe that 5% figure is what the risk would be if no new thought were devoted to AI safety, but the risk will be much lower assuming that a routine amount of thought is devoted to safety. That doesn’t seem like the right way to interpret the survey questions, but it seems quite possible that some respondents interpreted the questions in ways that seem screwy to me.
I can imagine (with a little effort) that a large minority of researchers are actually confident that the problems are will be solved. But only 26% said the value alignment problem (as described by Stuart Russell – see his description in the AI Impacts discussion of the survey) was easier “relative to other problems in the field”.
Hypothesis 1a: Risks will be easier to deal with later, when we have better understanding of AI.
Think of the analogy to global warming: it’s easy to imagine 50 years from now we’ll have much better technology for controlling global temperature, or for moving away from the predictable consequences. (It’s mainly the possibility of unpredictable consequences that make me a bit scared to postpone solutions to global warming).
Likewise, I expect ML researchers will make better tools to help us evaluate what neural nets are doing, as a normal part of their work. I can sort of imagine that that’s the tip of an iceberg, and that a dozen or so strategies of this nature will add up to a high probability of a good result.
That’s not remotely adequate to reassure me, but I expect ML researchers to have more confidence in this than I do, due to some combination of selection effects (people who are pessimistic about ML tools are less likely to be ML researchers), greater awareness of ML, and peer pressure. (For hints about how MIRI’s view differs from the ML view, see their intuitions that solving alignment for an simple, abstract model of AI is easier than solving it for a messy real-world AI with tractability constraints).
Hypothesis 1a seems likely to explain something like 30 to 60% of the paradox.
Do disagreements about how AI will be implemented cause disagreements about the difficulty? Here are three straw men(?) theories with different implications for how hard AI risk problems are (exaggerated a bit to emphasize the differences):
- The GOFAI theory: we design an AI that’s smart enough when turned on that it can understand human language. So we can add in a human language description of what we want it to do, and it will interpret that sensibly.
- The straw-Eliezer theory: we design a seed AI which initially knows nothing about the world, but has a theory of pure intelligence that enables it to develop god-like wisdom. It needs to have goals before it has any clue about what humans are or what human language means. It will by default want to preserve those initial goals, so we probably need to encode human values (or a description of how to find them) in a language that doesn’t contain any concept that comes within light-years of having a concept like “human being”.
- The straw(?)-ML theory: AI’s won’t have explicitly encoded goals in the sense that the theories described above suggest. Goals will be inferred from training examples. Therefore, we need to pick training examples that reflect human values. (See Paul Christiano’s site for more sophisticated ideas that resemble this.)
I rarely see people explicitly state their assumptions about this, and I’ve seen some hints at wide disagreements that could be caused by confusion about which of those theories are realistic, so I assume this is a moderately important source of misunderstanding about AI risks.
Hypothesis 2: Fatalist pessimism.
“If we’re lucky, they might decide to keep us as pets” – Marvin Minsky, LIFE Nov 20, 1970.
Or Hugo de Garis: “Do we build gods, or do we build our potential exterminators?” I’m unclear whether de Garis would classify human extinction as extremely bad if humans are replaced by more advanced minds, but I expect most people would classify that result as extremely bad.
I haven’t seen much recent evidence of AI researchers taking this attitude, but I expect most people with this attitude aren’t eager to publicize it. So the limited evidence for this belief isn’t very reassuring.
Fatalism would be a somewhat sensible attitude if dealing with AI-related existential risks looked as intractable as cures for aging looked to medical researchers for most of the 20th century. People who tried to cure aging with 20th century tools probably burned out without accomplishing much. It’s somewhat plausible that AI risks are similarly intractable, but the survey results suggest much less of a consensus for that view than was the case for aging in the 20th century.
Hypothesis 3: It would be weird to worry.
They observe that most other researchers are calm, and assume that means those other researchers know of a good reason not to worry. Therefore, they don’t investigate the risks carefully.
This may have been the main explanation at one time, but leading researchers have expressed enough uncertainty in the past few years to make this seem unlikely to explain a large fraction of the responses.
Hypothesis 4: The costs of reducing risks outweigh the benefits.
If a researcher imagines a 5% risk of extinction and multiplies by a 0.01% chance of dealing with the risk, then compares that to a 50% risk to the researcher’s job from focusing on the risk, that implies that it only makes sense to worry if the researcher values humanity at least 100,000 times as much as he values his job. It would certainly be desirable for researchers to value humanity more highly than that, but it wouldn’t surprise me if they fail to do so.
Hypothesis 5: The risks are too distant.
It’s pretty rare for people to care about the distant future. We mostly just follow heuristics which produce rewards with a few years, or occasionally a decade or two.
Even in the case of global warming, where people on both sides of disputes posture as if they cared about effects a century or more out, I see people mainly being interested when they can use the topics to score ideological points, or when there are interest groups that can capture money or power by influencing legislation. If many of those people cared about long-term consequences, I’d expect more interest in Cool Earth, or projects to cause roofs to be colors that reflect more light, or some such approach where throwing modest amounts of money at the problem will buy us some time.
Those of us who write about AI risks probably aren’t much better – we’re likely influenced by desires to look like we’re trying to save the world.
So it would be surprising for people to worry much about AI risks when they believe that human-level AI is a century away, even if they see nontrivial risks that it will cause a global catastrophe.
Conclusion: there’s no shortage of somewhat plausible explanations for why researchers might report unusual risks, and then act as if the risks were unimportant. It’s somewhat possible that there’s an explanation that should reassure us, but the most promising ones sound like “Trust me, I know what I’m doing“.
[1] – at least for some values of “we”. There were some slight problems with hunter-gatherers who were unable to survive competition with farmers; with neanderthals who couldn’t compete with humans; and the residents of the new world who were unprepared for European technology that transported diseases across oceans. But we’ve got better advance warning and means to prepare than they had, right?
4a: extinction is beneficial (one or more of undeserving humans, deserving progeny, end human suffering).
Pingback: Rational Feed – deluks917
What is the lizardman constant of AI researchers?
AI risk is much worse for mankind than the individual. If friendly AI would allow us to live millions of years in utopia, while unfriendly AI would painlessly kill us then even if I think that AI will with .9 probability kill me, on average AI makes me better off.