Artificial Intelligence

Some comments on last weekend’s Foresight Conference:

At lunch on Sunday I was in a group dominated by a discussion between Robin Hanson and Eliezer Yudkowsky over the relative plausibility of new intelligences having a variety of different goal systems versus a single goal system (as in a society of uploads versus Friendly AI). Some of the debate focused on how unified existing minds are, with Eliezer claiming that dogs mostly don’t have conflicting desires in different parts of their minds, and Robin and others claiming such conflicts are common (e.g. when deciding whether to eat food the dog has been told not to eat).

One test Eliezer suggested for the power of systems with a unified goal system is that if Robin were right, bacteria would have outcompeted humans. That got me wondering whether there’s an appropriate criterion by which humans can be said to have outcompeted bacteria. The most obvious criterion on which humans and bacteria are trying to compete is how many copies of their DNA exist. Using biomass as a proxy, bacteria are winning by several orders of magnitude. Another possible criterion is impact on large-scale features of Earth. Humans have not yet done anything that seems as big as the catastrophic changes to the atmosphere (“the oxygen crisis”) produced by bacteria. Am I overlooking other appropriate criteria?

Kartik Gada described two humanitarian innovation prizes that bear some resemblance to a valuable approach to helping the world’s poorest billion people, but will be hard to turn into something with a reasonable chance of success. The Water Liberation Prize would be pretty hard to judge. Suppose I submit a water filter that I claim qualifies for the prize. How will the judges test the drinkability of the water and the reusability of the filter under common third world conditions (which I suspect vary a lot and which probably won’t be adequately duplicated where the judges live)? Will they ship sample devices to a number of third world locations and ask whether it produces water that tastes good, or will they do rigorous tests of water safety? With a hoped for prize of $50,000, I doubt they can afford very good tests. The Personal Manufacturing Prizes seem somewhat more carefully thought out, but need some revision. The “three different materials” criterion is not enough to rule out overly specialized devices without some clear guidelines about which differences are important and which are trivial. Setting specific award dates appears to assume an implausible ability to predict how soon such a device will become feasible. The possibility that some parts of the device are patented is tricky to handle, as it isn’t cheap to verify the absence of crippling patents.

There was a debate on futarchy between Robin Hanson and Mencius Moldbug. Moldbug’s argument seems to boil down to the absence of a guarantee that futarchy will avoid problems related to manipulation/conflicts of interest. It’s unclear whether he thinks his preferred form of government would guarantee any solution to those problems, and he rejects empirical tests that might compare the extent of those problems under the alternative systems. Still, Moldbug concedes enough that it should be possible to incorporate most of the value of futarchy within his preferred form of government without rejecting his views. He wants to limit trading to the equivalent of the government’s stockholders. Accepting that limitation isn’t likely to impair the markets much, and may make futarchy more palatable to people who share Moldbug’s superstitions about markets.

Book review: Moral Machines: Teaching Robots Right from Wrong by Wendell Wallach and Collin Allen.

This book combines the ideas of leading commentators on ethics, methods of implementing AI, and the risks of AI, into a set of ideas on how machines ought to achieve ethical behavior.

The book mostly provides an accurate survey of what those commentators agree and disagree about. But there’s enough disagreement that we need some insights into which views are correct (especially about theories of ethics) in order to produce useful advice to AI designers, and the authors don’t have those kinds of insights.

The book focuses more on near term risks of software that is much less intelligent than humans, and is complacent about the risks of superhuman AI.

The implications of superhuman AIs for theories of ethics ought to illuminate flaws in them that aren’t obvious when considering purely human-level intelligence. For example, they mention an argument that any AI would value humans for their diversity of ideas, which would help AIs to search the space of possible ideas. This seems to have serious problems, such as what stops an AI from fiddling with human minds to increase their diversity? Yet the authors are too focused on human-like minds to imagine an intelligence which would do that.

Their discussion of the advocates friendly AI seems a bit confused. The authors wonder if those advocates are trying to quell apprehension about AI risks, when I’ve observed pretty consistent efforts by those advocates to create apprehension among AI researchers.

Book review: Global Catastrophic Risks by Nick Bostrom, and Milan Cirkovic.
This is a relatively comprehensive collection of thoughtful essays about the risks of a major catastrophe (mainly those that would kill a billion or more people).
Probably the most important chapter is the one on risks associated with AI, since few people attempting to create an AI seem to understand the possibilities it describes. It makes some implausible claims about the speed with which an AI could take over the world, but the argument they are used to support only requires that a first-mover advantage be important, and that is only weakly dependent on assumptions about that speed with which AI will improve.
The risks of a large fraction of humanity being killed by a super-volcano is apparently higher than the risk from asteroids, but volcanoes have more of a limit on their maximum size, so they appear to pose less risk of human extinction.
The risks of asteroids and comets can’t be handled as well as I thought by early detection, because some dark comets can’t be detected with current technology until it’s way too late. It seems we ought to start thinking about better detection systems, which would probably require large improvements in the cost-effectiveness of space-based telescopes or other sensors.
Many of the volcano and asteroid deaths would be due to crop failures from cold weather. Since mid-ocean temperatures are more stable that land temperatures, ocean based aquaculture would help mitigate this risk.
The climate change chapter seems much more objective and credible than what I’ve previously read on the subject, but is technical enough that it won’t be widely read, and it won’t satisfy anyone who is looking for arguments to justify their favorite policy. The best part is a list of possible instabilities which appear unlikely but which aren’t understood well enough to evaluate with any confidence.
The chapter on plagues mentions one surprising risk – better sanitation made polio more dangerous by altering the age at which it infected people. If I’d written the chapter, I’d have mentioned Ewald’s analysis of how human behavior influences the evolution of strains which are more or less virulent.
There’s good news about nuclear proliferation which has been under-reported – a fair number of countries have abandoned nuclear weapons programs, and a few have given up nuclear weapons. So if there’s any trend, it’s toward fewer countries trying to build them, and a stable number of countries possessing them. The bad news is we don’t know whether nanotechnology will change that by drastically reducing the effort needed to build them.
The chapter on totalitarianism discusses some uncomfortable tradeoffs between the benefits of some sort of world government and the harm that such government might cause. One interesting claim:

totalitarian regimes are less likely to foresee disasters, but are in some ways better-equipped to deal with disasters that they take seriously.

Steve Omohundro has recently written a paper and given a talk (a video should become available soon) on AI ethics with arguments whose most important concerns resemble Eliezer Yudkowsky’s. I find Steve’s style more organized and more likely to convince mainstream researchers than Eliezer’s best attempt so far.
Steve avoids Eliezer’s suspicious claims about how fast AI will take off, and phrases his arguments in ways that are largely independent of the takeoff speed. But a sentence or two in the conclusion of his paper suggests that he is leaning toward solutions which assume multiple AIs will be able to safeguard against a single AI imposing its goals on the world. He doesn’t appear to have a good reason to consider this assumption reliable, but at least he doesn’t show the kind of disturbing certainty that Eliezer has about the first self-improving AI becoming powerful enough to take over the world.
Possibly the most important news in Steve’s talk was his statement that he had largely stopped working to create intelligent software due to his concerns about safely specifying goals for an AI. He indicated that one important insight that contributed to this change of mind came when Carl Shulman pointed out a flaw in Steve’s proposal for a utility function which included a goal of the AI shutting itself off after a specified time (the flaw involves a small chance of physics being different from apparent physics and how the AI will evaluate expected utilities resulting from that improbable physics).

Tim Freeman has a paper which clarifies many of the issues that need to be solved for humans to coexist with a superhuman AI. It comes close to what we would need if we had unlimited computing power. I will try amplify on some of the criticisms of it from the sl4 mailing list.
It errs on the side of our current intuitions about what I consider to be subgoals, rather than trusting the AI’s reasoning to find good subgoals to meet primary human goal(s). Another way to phrase that would be that it fiddles with parameters to get special-case results that fit our intuitions rather than focusing on general purpose solutions that would be more likely to produce good results in conditions that we haven’t yet imagined.
For example, concern about whether the AI pays the grocer seems misplaced. If our current intuitions about property rights continue to be good guidelines for maximizing human utility in a world with a powerful AI, why would that AI not reach that conclusion by inferring human utility functions from observed behavior and modeling the effects of property rights on human utility? If not, then why shouldn’t we accept that the AI has decided on something better than property rights (assuming our other methods of verifying that the AI is optimizing human utility show no flaws)?
Is it because we lack decent methods of verifying the AI’s effects on phenomena such as happiness that are more directly related to our utility functions? If so, it would seem to imply that we have an inadequate understanding of what we mean by maximizing utility. I didn’t see a clear explanation of how the AI would infer utility functions from observing human behavior (maybe the source code, which I haven’t read, clarifies it), but that appears to be roughly how humans at their best make the equivalent moral judgments.
I see similar problems with designing the AI to produce the “correct” result with Pascal’s Wager. Tim says “If Heaven and Hell enter into a decision about buying apples, the outcome seems difficult to predict”. Since humans have a poor track record at thinking rationally about very small probabilities and phenomena such as Heaven that are hard to observe, I wouldn’t expect AI unpredictability in this area to be evidence of a problem. It seems more likely that humans are evaluating Pascal’s Wager incorrectly than that a rational AI which can infer most aspects of human utility functions from human behavior will evaluate it incorrectly.

Book review: Beyond AI: Creating the Conscience of the Machine by J. Storrs Hall
The first two thirds of this book survey current knowledge of AI and make some guesses about when and how it will take off. This part is more eloquent than most books on similar subjects, and its somewhat different from normal perspective makes it worth reading if you are reading several books on the subject. But ease of reading is the only criterion by which this section stands out as better than competing books.
The last five chapters that are surprisingly good, and should shame most professional philosophers whose writings by comparison are a waste of time.
His chapter on consciousness, qualia, and related issues is more concise and persuasive than anything else I’ve read on these subjects. It’s unlikely to change the opinions of people who have already thought about these subjects, but it’s an excellent place for people who are unfamiliar with them to start.
His discussions of ethics using game theory and evolutionary pressures is an excellent way to frame ethical discussions.
My biggest disappointment was that he starts to recognize a possibly important risk of AI when he says “disparities among the abilities of AIs … could negate the evolutionary pressure to reciprocal altruism”, but then seems to dismiss that thoughtlessly (“The notion of one single AI taking off and obtaining hegemony over the whole world by its own efforts is ludicrous”).
He probably has semi-plausible grounds for dismissing some of the scenarios of this nature that have been proposed (e.g. the speed at which some people imagine an AI would take off is improbable). But if AIs with sufficiently general purpose intelligence enhance their intelligence at disparate rates for long enough, the results would render most of the book’s discussion of ethics irrelevant. The time it took humans to accumulate knowledge didn’t give Neanderthals much opportunity to adapt. Would the result have been different if Neanderthals had learned to trade with humans? The answer is not obvious, and probably depends on Neanderthal learning abilities in ways that I don’t know how to analyze.
Also, his arguments for optimism aren’t quite as strong as he thinks. His point that career criminals are generally of low intelligence is reassuring if the number of criminals is all that matters. But when the harm done by one relatively smart criminal can be very large (e.g. Mao), it’s hard to say that the number of criminals is all that matters.
Here’s a nice quote from Mencken which this book quotes part of:

Moral certainty is always a sign of cultural inferiority. The more uncivilized the man, the surer he is that he knows precisely what is right and what is wrong. All human progress, even in morals, has been the work of men who have doubted the current moral values, not of men who have whooped them up and tried to enforce them. The truly civilized man is always skeptical and tolerant, in this field as in all others. His culture is based on ‘I am not too sure.’

Another interesting tidbit is the anecdote that H.G. Wells predicted in 1907 that flying machines would be built. In spite of knowing a lot about attempts to build them, he wasn’t aware that the Wright brothers had succeeded in 1903.
If an AI started running in 2003 that has accumulated the knowledge of a 4-year old human and has the ability to continue learning at human or faster speeds, would we have noticed? Or would the reports we see about it sound too much like the reports of failed AIs for us to pay attention?

Book review: How to Survive a Robot Uprising: Tips on Defending Yourself Against the Coming Rebellion by Daniel H. Wilson
This book combines good analyses of recent robotics research with an understanding of movie scenarios about robot intentions (“how could millions of dollars of special effects lead us astray?”) to produce advice of unknown value about how humans might deal with any malicious robots of the next decade or two.
It focuses mainly on what an ordinary individual or small groups can do to save themselves or postpone their demise, and says little about whether a major uprising can be prevented.
The book’s style is somewhat like the Daily Show’s style, mixing a good deal of accurate reporting with occasional bits of obvious satire (“Robots have no emotions. Sensing your fear could make a robot jealous”), but it doesn’t quite attain the Daily Show’s entertainment value.
Its analyses of the weaknesses of current robot sensors and intelligence should make it required reading for any science fiction author or movie producer who wants to appear realistic (I haven’t been paying enough attention to those fields recently to know whether such people still exist). But it needs a bit of common sense to be used properly. It’s all too easy to imagine a gullible movie producer following its advice to have humans build a time machine and escape to the Cretaceous without pondering whether the robots will use similar time machines to follow them.

Nick Bostrom has a good paper on Astronomical Waste: The Opportunity Cost of Delayed Technological Development, which argues that under most reasonable ethical systems that aren’t completely selfish or very parochial, our philanthropic activities ought to be devoted primarily toward preventing disasters that would cause the extinction of intelligent life.
Some people who haven’t thought about the Fermi Paradox carefully may overestimate the probability that most of the universe is already occupied by intelligent life. Very high estimates for that probability would invalidate Bostrom’s conclusion, but I haven’t found any plausible arguments that would justify that high a probability.
I don’t want to completely dismiss Malthusian objections that life in the distant future will be barely worth living, but the risk of a Malthusian future would need to be well above 50 percent to substantially alter the optimal focus of philanthropy, and the strongest Malthusian arguments that I can imagine leave much more uncertainty than that. (If I thought I could alter the probability of a Malthusian future, maybe I should devote effort to that. But I don’t currently know where to start).
Thus the conclusion seems like it ought to be too obvious to need repeating, but it’s far enough from our normal experiences that most of us tend to pay inadequate attention to it. So I’m mentioning it in order to remind people (including myself) of the need to devote more of our time to thinking about risks such as those associated with AI or asteroid impacts.