existential risks

All posts tagged existential risks

Two and a half years ago, Eliezer was (somewhat plausibly) complaining that virtually nobody outside of MIRI was working on AI-related existential risks.

This year (at EAGlobal) one of MIRI’s talks was a bit hard to distinguish from an AI safety talk given by someone with pretty mainstream AI affiliations.

What happened in that time to cause that shift?

A large change was catalyzed by the publication of Superintelligence. I’ve been mildly disappointed about how little it affected discussions among people who were already interested in the topic. But Superintelligence caused a large change in how many people are willing to express concern over AI risks. That’s presumably because Superintelligence looks sufficiently academic and neutral to make many people comfortable about citing it, whereas similar arguments by Eliezer/MIRI didn’t look sufficiently prestigious within academia.

A smaller part of the change was MIRI shifting its focus somewhat to be more in line with how mainstream machine learning (ML) researchers expect AI to reach human levels.

Also, OpenAI has been quietly shifting in a more MIRI-like direction (I’m very unclear on how big a change this is). (Paul Christiano seems to deserve some credit for both the MIRI and OpenAI shifts in strategies.)

Given those changes, it seems like MIRI ought to be able to attract more donations than before. Especially since it has demonstrated evidence of increasing competence, and also because HPMoR seemed to draw significantly more people into the community of people who are interested in MIRI.

MIRI has gotten one big grant from OpenPhilanthropy that it probably couldn’t have gotten when mainstream AI researchers were treating MIRI’s concerns as too far-fetched to be worth commenting on. But donations from MIRI’s usual sources have stagnated.

That pattern suggests that MIRI was previously benefiting from a polarization effect, where the perception of two distinct “tribes” (those who care about AI risks versus those who promote AI) energized people to care about “their tribe”.

Whereas now there’s no clear dividing line between MIRI and mainstream researchers. Also, there’s lots of money going into other organizations that plan to do something about AI safety. (Most of those haven’t yet articulated enough of a strategy to make me optimistic that that money is well spent. I still endorse the ideas I mentioned last year in How much Diversity of AGI-Risk Organizations is Optimal?. I’m unclear on how much diversity of approaches we’re getting from the recent proliferation of AI safety organizations.)

That kind of pattern of donations creates perverse incentives to charities to at least market themselves as fighting a powerful group of people, rather than (as the ideal charity should be) addressing a neglected problem. Even if that marketing doesn’t distort a charity’s operations, the charity will be tempted to use counterproductive alarmism. AI risk organizations have resisted those temptations (at least recently), but it seems risky to tempt them.

That’s part of why I recently made a modest donation to MIRI, in spite of the uncertainty over the value of their efforts (I had last donated to them in 2009).

This post is partly a response to arguments for only donating to one charity and to an 80,000 Hours post arguing against diminishing returns. But I’ll focus mostly on AGI-risk charities.

Diversifying Donations?

The rule that I should only donate to one charity is a good presumption to start with. Most objections to it are due to motivations that diverge from pure utilitarian altruism. I don’t pretend that altruism is my only motive for donating, so I’m not too concerned that I only do a rough approximation of following that rule.

Still, I want to follow the rule more closely than most people do. So when I direct less than 90% of my donations to tax-deductible nonprofits, I feel a need to point to diminishing returns [1] to donations to justify that.

With AGI risk organizations, I expect the value of diversity to sometimes override the normal presumption even for purely altruistic utilitarians (with caveats about having the time needed to evaluate multiple organizations, and having more than a few thousand dollars to donate; those caveats will exclude many people from this advice, so this post is mainly oriented toward EAs who are earning to give or wealthier people).

Diminishing Returns?

Before explaining that, I’ll reply to the 80,000 Hours post about diminishing returns.

The 80,000 Hours post focuses on charities that mostly market causes to a wide audience. The economies of scale associated with brand recognition and social proof seem more plausible than any economies of scale available to research organizations.

The shortage of existential risk research seems more dangerous than any shortage of charities which are devoted to marketing causes, so I’m focusing on the most important existential risk.

I expect diminishing returns to be common after an organization grows beyond two or three people. One reason is that the founders of most organizations exert more influence than subsequent employees over important policy decisions [2], so at productive organizations founders are more valuable.

For research organizations that need the smartest people, the limited number of such people implies that only small organizations can have a large fraction of employees be highly qualified.

I expect donations to very young organizations to be more valuable than other donations (which implies diminishing returns to size on average):

  • It takes time to produce evidence that the organization is accomplishing something valuable, and donors quite sensibly prefer organizations that have provided such evidence.
  • Even when donors try to compensate for that by evaluating the charity’s mission statement or leader’s competence, it takes some time to adequately communicate those features (e.g. it’s rare for a charity to set up an impressive web site on day one).
  • It’s common for a charity to have suboptimal competence at fundraising until it grows large enough to hire someone with fundraising expertise.
  • Some charities are mainly funded by a few grants in the millions of dollars, and I’ve heard reports that those often take many months between being awarded and reaching the charities’ bank (not to mention delays in awarding the grants). This sometimes means months when a charity has trouble hiring anyone who demands an immediate salary.
  • Donors could in principle overcome these causes of bias, but as far as I can tell, few care about doing so. EA’s come a little closer to doing this than others, but my observations suggest that EA’s are almost as lazy about analyzing new charities as non EA’s.
  • Therefore, I expect young charities to be underfunded.

Why AGI risk research needs diversity

I see more danger of researchers pursuing useless approaches for existential risks in general, and AGI risks in particular (due partly to the inherent lack of feedback), than with other causes.

The most obvious way to reduce that danger is to encourage a wide variety of people and organizations to independently research risk mitigation strategies.

I worry about AGI-risk researchers focusing all their effort on a class of scenarios which rely on a false assumption.

The AI foom debate seems superficially like the main area where a false assumption might cause AGI research to end up mostly wasted. But there are enough influential people on both sides of this issue that I expect research to not ignore one side of that debate for long.

I worry more about assumptions that no prominent people question.

I’ll describe how such an assumption might look in hindsight via an analogy to some leading developers of software intended to accomplish what the web ended up accomplishing [3].

Xanadu stood out as the leading developer of global hypertext software in the 1980s to about the same extent that MIRI stands out as the leading AGI-risk research organization. One reason [4] that Xanadu accomplished little was the assumption that they needed to make money. Part of why that seemed obvious in the 1980s was that there were no ISPs delivering an internet-like platform to ordinary people, and hardware costs were a big obstacle to anyone who wanted to provide that functionality. The hardware costs declined at a predictable enough rate that Drexler was able to predict in Engines of Creation (published in 1986) that ordinary people would get web-like functionality within a decade.

A more disturbing reason for assuming that web functionality needed to make a profit was the ideology surrounding private property. People who opposed private ownership of home, farms, factories, etc. were causing major problems. Most of us automatically treated ownership of software as working the same way as physical property.

People who are too young to remember attitudes toward free / open source software before about 1997 will have some trouble believing how reluctant people were to imagine valuable software being free. [5] Attitudes changed unusually fast due to the demise of communism and the availability of affordable internet access.

A few people (such as RMS) overcame the focus on cold war issues, but were too eccentric to convert many followers. We should pay attention to people with similarly eccentric AGI-risk views.

If I had to guess what faulty assumption AGI-risk researchers are making, I’d say something like faulty guesses about the nature of intelligence or the architecture of feasible AGIs. But the assumptions that look suspicious to me are ones that some moderately prominent people have questioned.

Vague intuitions along these lines have led me to delay some of my potential existential-risk donations in hopes that I’ll discover (or help create?) some newly created existential-risk projects which produce more value per dollar.

Conclusions

How does this affect my current giving pattern?

My favorite charity is CFAR (around 75 or 80% of my donations), which improves the effectiveness of people who might start new AGI-risk organizations or AGI-development organizations. I’ve had varied impressions about whether additional donations to CFAR have had diminishing returns. They seem to have been getting just barely enough money to hire employees they consider important.

FLI is a decent example of a possibly valuable organization that CFAR played some hard-to-quantify role in starting. It bears a superficial resemblance to an optimal incubator for additional AGI-risk research groups. But FLI seems too focused on mainstream researchers to have much hope of finding the eccentric ideas that I’m most concerned about AGI-researchers overlooking.

Ideally I’d be donating to one or two new AGI-risk startups per year. Conditions seem almost right for this. New AGI-risk organizations are being created at a good rate, mostly getting a few large grants that are probably encouraging them to focus on relatively mainstream views [6].

CSER and FLI sort of fit this category briefly last year before getting large grants, and I donated moderate amounts to them. I presume I didn’t give enough to them for diminishing returns to be important, but their windows of unusual need were short enough that I might well have come close to that.

I’m a little surprised that the increasing interest in this area doesn’t seem to be catalyzing the formation of more low-budget groups pursuing more unusual strategies. Please let me know of any that I’m overlooking.

See my favorite charities web page (recently updated) for more thoughts about specific charities.

[1] – Diminishing returns are the main way that donating to multiple charities at one time can be reconciled with utilitarian altruism.

[2] – I don’t know whether it ought to work this way, but I expect this pattern to continue.

[3] – they intended to accomplish a much more ambitious set of goals.

[4] – probably not the main reason.

[5] – presumably the people who were sympathetic to communism weren’t attracted to small software projects (too busy with politics?) or rejected working on software due to the expectation that it required working for evil capitalists.

[6] – The short-term effects are probably good, increasing the diversity of approaches compared to what would be the case if MIRI were the only AGI-risk organization, and reducing the risk that AGI researchers would become polarized into tribes that disagree about whether AGI is dangerous. But a field dominated by a few funders tends to focus on fewer ideas than one with many funders.

I’d like to see more discussion of uploaded ape risks.

There is substantial disagreement over how fast an uploaded mind (em) would improve its abilities or the abilities of its progeny. I’d like to start by analyzing a scenario where it takes between one and ten years for an uploaded bonobo to achieve human-level cognitive abilities. This scenario seems plausible, although I’ve selected it more to illustrate a risk that can be mitigated than because of arguments about how likely it is.

I claim we should anticipate at least a 20% chance a human-level bonobo-derived em would improve at least as quickly as a human that uploaded later.

Considerations that weigh in favor of this are: that bonobo minds seem to be about as general-purpose as humans, including near-human language ability; and the likely ease of ems interfacing with other software will enable them to learn new skills faster than biological minds will.

The most concrete evidence that weighs against this is the modest correlation between IQ and brain size. It’s somewhat plausible that it’s hard to usefully add many neurons to an existing mind, and that bonobo brain size represents an important cognitive constraint.

I’m not happy about analyzing what happens when another species develops more powerful cognitive abilities than humans, so I’d prefer to have some humans upload before the bonobos become superhuman.

A few people worry that uploading a mouse brain will generate enough understanding of intelligence to quickly produce human-level AGI. I doubt that biological intelligence is simple / intelligible enough for that to work. So I focus more on small tweaks: the kind of social pressures which caused the Flynn Effect in humans, selective breeding (in the sense of making many copies of the smartest ems, with small changes to some copies), and faster software/hardware.

The risks seem dependent on the environment in which the ems live and on the incentives that might drive their owners to improve em abilities. The most obvious motives for uploading bonobos (research into problems affecting humans, and into human uploading) create only weak incentives to improve the ems. But there are many other possibilities: military use, interesting NPCs, or financial companies looking for interesting patterns in large databases. No single one of those looks especially likely, but with many ways for things to go wrong, the risks add up.

What could cause a long window between bonobo uploading and human uploading? Ethical and legal barriers to human uploading, motivated by risks to the humans being uploaded and by concerns about human ems driving human wages down.

What could we do about this risk?

Political activism may mitigate the risks of hostility to human uploading, but if done carelessly it could create a backlash which worsens the problem.

Conceivably safety regulations could restrict em ownership/use to people with little incentive to improve the ems, but rules that looked promising would still leave me worried about risks such as irresponsible people hacking into computers that run ems and stealing copies.

A more sophisticated approach is to improve the incentives to upload humans. I expect the timing of the first human uploads to be fairly sensitive to whether we have legal rules which enable us to predict who will own em labor. But just writing clear rules isn’t enough – how can we ensure political support for them at a time when we should expect disputes over whether they’re people?

We could also find ways to delay ape uploading. But most ways of doing that would also delay human uploading, which creates tradeoffs that I’m not too happy with (partly due to my desire to upload before aging damages me too much).

If a delay between bonobo and human uploading is dangerous, then we should also ask about dangers from other uploaded species. My intuition says the risks are much lower, since it seems like there are few technical obstacles to uploading a bonobo brain shortly after uploading mice or other small vertebrates.

But I get the impression that many people associated with MIRI worry about risks of uploaded mice, and I don’t have strong evidence that I’m wiser than they are. I encourage people to develop better analyses of this issue.

Book review: Artificial Superintelligence: A Futuristic Approach, by Roman V. Yampolskiy.

This strange book has some entertainment value, and might even enlighten you a bit about the risks of AI. It presents many ideas, with occasional attempts to distinguish the important ones from the jokes.

I had hoped for an analysis that reflected a strong understanding of which software approaches were most likely to work. Yampolskiy knows something about computer science, but doesn’t strike me as someone with experience at writing useful code. His claim that “to increase their speed [AIs] will attempt to minimize the size of their source code” sounds like a misconception that wouldn’t occur to an experienced programmer. And his chapter “How to Prove You Invented Superintelligence So No One Else Can Steal It” seems like a cute game that someone might play with if he cared more about passing a theoretical computer science class than about, say, making money on the stock market, or making sure the superintelligence didn’t destroy the world.

I’m still puzzling over some of his novel suggestions for reducing AI risks. How would “convincing robots to worship humans as gods” differ from the proposed Friendly AI? Would such robots notice (and resolve in possibly undesirable ways) contradictions in their models of human nature?

Other suggestions are easy to reject, such as hoping AIs will need us for our psychokinetic abilities (abilities that Yampolskiy says are shown by peer-reviewed experiments associated with the Global Consciousness Project).

The style is also weird. Some chapters were previously published as separate papers, and weren’t adapted to fit together. It was annoying to occasionally see sentences that seemed identical to ones in a prior chapter.

The author even has strange ideas about what needs footnoting. E.g. when discussing the physical limits to intelligence, he cites (Einstein 1905).

Only read this if you’ve read other authors on this subject first.

Book review: Superintelligence: Paths, Dangers, Strategies, by Nick Bostrom.

This book is substantially more thoughtful than previous books on AGI risk, and substantially better organized than the previous thoughtful writings on the subject.

Bostrom’s discussion of AGI takeoff speed is disappointingly philosophical. Many sources (most recently CFAR) have told me to rely on the outside view to forecast how long something will take. We’ve got lots of weak evidence about the nature of intelligence, how it evolved, and about how various kinds of software improve, providing data for an outside view. Bostrom assigns a vague but implausibly high probability to AI going from human-equivalent to more powerful than humanity as a whole in days, with little thought of this kind of empirical check.

I’ll discuss this more in a separate post which is more about the general AI foom debate than about this book.

Bostrom’s discussion of how takeoff speed influences the chance of a winner-take-all scenario makes it clear that disagreements over takeoff speed are pretty much the only cause of my disagreement with him over the likelihood of a winner-take-all outcome. Other writers aren’t this clear about this. I suspect those who assign substantial probability to a winner-take-all outcome if takeoff is slow will wish he’d analyzed this in more detail.

I’m less optimistic than Bostrom about monitoring AGI progress. He says “it would not be too difficult to identify most capable individuals with a long-standing interest in [AGI] research”. AGI might require enough expertise for that to be true, but if AGI surprises me by only needing modest new insights, I’m concerned by the precedent of Tim Berners-Lee creating a global hypertext system while barely being noticed by the “leading” researchers in that field. Also, the large number of people who mistakenly think they’ve been making progress on AGI may obscure the competent ones.

He seems confused about the long-term trends in AI researcher beliefs about the risks: “The pioneers of artificial intelligence … mostly did not contemplate the possibility of greater-than-human AI” seems implausible; it’s much more likely they expected it but were either overconfident about it producing good results or fatalistic about preventing bad results (“If we’re lucky, they might decide to keep us as pets” – Marvin Minsky, LIFE Nov 20, 1970).

The best parts of the book clarify many issues related to ensuring that an AGI does what we want.

He catalogs more approaches to controlling AGI than I had previously considered, including tripwires, oracles, and genies, and clearly explains many limits to what they can accomplish.

He briefly mentions the risk that the operator of an oracle AI would misuse it for her personal advantage. Why should we have less concern about the designers of other types of AGI giving them goals that favor the designers?

If an oracle AI can’t produce a result that humans can analyze well enough to decide (without trusting the AI) that it’s safe, why would we expect other approaches (e.g. humans writing the equivalent seed AI directly) to be more feasible?

He covers a wide range of ways we can imagine handling AI goals, including strange ideas such as telling an AGI to use the motivations of superintelligences created by other civilizations

He does a very good job of discussing what values we should and shouldn’t install in an AGI: the best decision theory plus a “do what I mean” dynamic, but not a complete morality.

I’m somewhat concerned by his use of “final goal” without careful explanation. People who anthropomorphise goals are likely to confuse at least the first few references to “final goal” as if it worked like a human goal, i.e. something that the AI might want to modify if it conflicted with other goals.

It’s not clear how much of these chapters depend on a winner-take-all scenario. I get the impression that Bostrom doubts we can do much about the risks associated with scenarios where multiple AGIs become superhuman. This seems strange to me. I want people who write about AGI risks to devote more attention to whether we can influence whether multiple AGIs become a singleton, and how they treat lesser intelligences. Designing AGI to reflect values we want seems almost as desirable in scenarios with multiple AGIs as in the winner-take-all scenario (I’m unsure what Bostrom thinks about that). In a world with many AGIs with unfriendly values, what can humans do to bargain for a habitable niche?

He has a chapter on worlds dominated by whole brain emulations (WBE), probably inspired by Robin Hanson’s writings but with more focus on evaluating risks than on predicting the most probable outcomes. Since it looks like we should still expect an em-dominated world to be replaced at some point by AGI(s) that are designed more cleanly and able to self-improve faster, this isn’t really an alternative to the scenarios discussed in the rest of the book.

He treats starting with “familiar and human-like motivations” (in an augmentation route) as an advantage. Judging from our experience with humans who take over large countries, a human-derived intelligence that conquered the world wouldn’t be safe or friendly, although it would be closer to my goals than a smiley-face maximizer. The main advantage I see in a human-derived superintelligence would be a lower risk of it self-improving fast enough for the frontrunner advantage to be large. But that also means it’s more likely to be eclipsed by a design more amenable to self-improvement.

I’m suspicious of the implication (figure 13) that the risks of WBE will be comparable to AGI risks.

  • Is that mainly due to “neuromorphic AI” risks? Bostrom’s description of neuromorphic AI is vague, but my intuition is that human intelligence isn’t flexible enough to easily get the intelligence part of WBE without getting something moderately close to human behavior.
  • Is the risk of uploaded chimp(s) important? I have some concerns there, but Bostrom doesn’t mention it.
  • How about the risks of competitive pressures driving out human traits (discussed more fully/verbosely at Slate Star Codex)? If WBE and AGI happen close enough together in time that we can plausibly influence which comes first, I don’t expect the time between the two to be long enough for that competition to have large effects.
  • The risk that many humans won’t have enough resources to survive? That’s scary, but wouldn’t cause the astronomical waste of extinction.

Also, I don’t accept his assertion that AGI before WBE eliminates the risks of WBE. Some scenarios with multiple independently designed AGIs forming a weakly coordinated singleton (which I consider more likely than Bostrom does) appear to leave the last two risks in that list unresolved.

This books represents progress toward clear thinking about AGI risks, but much more work still needs to be done.

Book review: Our Mathematical Universe: My Quest for the Ultimate Nature of Reality, by Max Tegmark.

His most important claim is the radical Platonist view that all well-defined mathematical structures exist, therefore most physics is the study of which of those we inhabit. His arguments are more tempting than any others I’ve seen for this view, but I’m left with plenty of doubt.

He points to ways that we can imagine this hypothesis being testable, such as via the fine-tuning of fundamental constants. But he doesn’t provide a good reason to think that those tests will distinguish his hypothesis from other popular approaches, as it’s easy to imagine that we’ll never find situations where they make different predictions.

The most valuable parts of the book involve the claim that the multiverse is spatially infinite. He mostly talks as if that’s likely to be true, but his explanations caused me to lower my probability estimate for that claim.

He gets that infinity by claiming that inflation continues in places for infinite time, and then claiming there are reference frames for which that infinite time is located in a spatial rather than a time direction. I have a vague intuition why that second step might be right (but I’m fairly sure he left something important out of the explanation).

For the infinite time part, I’m stuck with relying on argument from authority, without much evidence that the relevant authorities have much confidence in the claim.

Toward the end of the book he mentions reasons to doubt infinities in physics theories – it’s easy to find examples where we model substances such as air as infinitely divisible, when we know that at some levels of detail atomic theory is more accurate. The eternal inflation theory depends on an infinitely expandable space which we can easily imagine is only an approximation. Plus, when physicists explicitly ask whether the universe will last forever, they don’t seem very confident. I’m also tempted to say that the measure problem (i.e. the absence of a way to say some events are more likely than others if they all happen an infinite number of times) is a reason to doubt infinities, but I don’t have much confidence that reality obeys my desire for it to be comprehensible.

I’m disappointed by his claim that we can get good evidence that we’re not Boltzmann brains. He wants us to test our memories, because if I am a Boltzmann brain I’ll probably have a bunch of absurd memories. But suppose I remember having done that test in the past few minutes. The Boltzmann brain hypothesis suggests it’s much more likely for me to have randomly acquired the memory of having passed the test than for me to actually be have done the test. Maybe there’s a way to turn Tegmark’s argument into something rigorous, but it isn’t obvious.

He gives a surprising argument that the differences between the Everett and Copenhagen interpretations of quantum mechanics don’t matter much, because unrelated reasons involving multiverses lead us to expect results comparable to the Everett interpretation even if the Copenhagen interpretation is correct.

It’s a bit hard to figure out what the book’s target audience is – he hides the few equations he uses in footnotes to make it look easy for laymen to follow, but he also discusses hard concepts such as universes with more than one time dimension with little attempt to prepare laymen for them.

The first few chapters are intended for readers with little knowledge of physics. One theme is a historical trend which he mostly describes as expanding our estimate of how big reality is. But the evidence he provides only tells us that the lower bounds that people give keep increasing. Looking at the upper bound (typically infinity) makes that trend look less interesting.

The book has many interesting digressions such as a description of how to build Douglas Adams’ infinite improbability drive.

Book review: Our Final Invention: Artificial Intelligence and the End of the Human Era by James Barrat.

This book describes the risks that artificial general intelligence will cause human extinction, presenting the ideas propounded by Eliezer Yudkowsky in a slightly more organized but less rigorous style than Eliezer has.

Barrat is insufficiently curious about why many people who claim to be AI experts disagree, so he’ll do little to change the minds of people who already have opinions on the subject.

He dismisses critics as unable or unwilling to think clearly about the arguments. My experience suggests that while it’s normally the case that there’s an argument that any one critic hasn’t paid much attention to, that’s often because they’ve rejected with some thought some other step in Eliezer’s reasoning and concluded that the step they’re ignoring wouldn’t influence their conclusions.

The weakest claim in the book is that an AGI might become superintelligent in hours. A large fraction of people who have worked on AGI (e.g. Eric Baum’s What is Thought?) dismiss this as too improbable to be worth much attention, and Barrat doesn’t offer them any reason to reconsider. The rapid takeoff scenarios influence how plausible it is that the first AGI will take over the world. Barrat seems only interested in talking to readers who can be convinced we’re almost certainly doomed if we don’t build the first AGI right. Why not also pay some attention to the more complex situation where an AGI takes years to become superhuman? Should people who think there’s a 1% chance of the first AGI conquering the world worry about that risk?

Some people don’t approve of trying to build an immutable utility function into an AGI, often pointing to changes in human goals without clearly analyzing whether those are subgoals that are being altered to achieve a stable supergoal/utility function. Barrat mentions one such person, but does little to analyze this disagreement.

Would an AGI that has been designed without careful attention to safety blindly follow a narrow interpretation of its programmed goal(s), or would it (after achieving superintelligence) figure out and follow the intentions of its authors? People seem to jump to whatever conclusion supports their attitude toward AGI risk without much analysis of why others disagree, and Barrat follows that pattern.

I can imagine either possibility. If the easiest way to encode a goal system in an AGI is something like “output chess moves which according to the rules of chess will result in checkmate” (turning the planet into computronium might help satisfy that goal).

An apparently harder approach would have the AGI consult a human arbiter to figure out whether it wins the chess game – “human arbiter” isn’t easy to encode in typical software. But AGI wouldn’t be typical software. It’s not obviously wrong to believe that software smart enough to take over the world would be smart enough to handle hard concepts like that. I’d like to see someone pin down people who think this is the obvious result and get them to explain how they imagine the AGI handling the goal before it reaches human-level intelligence.

He mentions some past events that might provide analogies for how AGI will interact with us, but I’m disappointed by how little thought he puts into this.

His examples of contact between technologically advanced beings and less advanced ones all refer to Europeans contacting Native Americans. I’d like to have seen a wider variety of analogies, e.g.:

  • Japan’s contact with the west after centuries of isolation
  • the interaction between neanderthals and humans
  • the contact that resulted in mitochondria becoming part of our cells

He quotes Vinge saying an AGI ‘would not be humankind’s “tool” – any more than humans are the tools of rabbits or robins or chimpanzees.’ I’d say that humans are sometimes the tools of human DNA, which raises more complex questions of how well the DNA’s interests are served.

The book contains many questionable digressions which seem to be designed to entertain.

He claims Google must have an AGI project in spite of denials by Google’s Peter Norvig (this was before it bought DeepMind). But the evidence he uses to back up this claim is that Google thinks something like AGI would be desirable. The obvious conclusion would be that Google did not then think it had the skill to usefully work on AGI, which would be a sensible position given the history of AGI.

He thinks there’s something paradoxical about Eliezer Yudkowsky wanting to keep some information about himself private while putting lots of personal information on the web. The specific examples Barrat gives strongly suggests that Eliezer doesn’t value the standard notion of privacy, but wants to limit peoples’ ability to distract him. Barrat also says Eliezer “gave up reading for fun several years ago”, which will surprise those who see him frequently mention works of fiction in his Author’s Notes.

All this makes me wonder who the book’s target audience is. It seems to be someone less sophisticated than a person who could write an AGI.

Discussions asking whether “Snowball Earth” triggered animal evolution (see the bottom half of that page) suggest increasing evidence that the Snowball Earth hypothesis may explain an important part of why spacefaring civilizations seem rare.

photosynthetic organisms are limited by nutrients, most often nitrogen or phosphorous

the glaciations led to high phosphorous concentrations, which led to high productivity, which led to high oxygen in the oceans and atmosphere, which allowed for animal evolution to be triggered and thus the rise of the metazoans.

This seems quite speculative, but if true it might mean that our planet needed a snowball earth effect for complex life to evolve, but also needed that snowball earth period to be followed by hundreds of millions of years without another snowball earth period that would wipe out complex life. It’s easy to imagine that the conditions needed to produce one snowball earth effect make it very unusual for the planet to escape repeated snowball earth events for as long as it did, thus explaining more of the Fermi paradox than seemed previously possible.

The most interesting talk at the Singularity Summit 2010 was Shane Legg‘s description of an Algorithmic Intelligence Quotient (AIQ) test that measures something intelligence-like automatically in a way that can test AI programs (or at least the Monte-Carlo AIXI that he uses) on 1000+ environments.

He had a mathematical formula which he thinks rigorously defines intelligence. But he didn’t specify what he meant by the set of possible environments, saying that would be a 50 page paper (he said a good deal of the work on the test had been done last week, so presumably he’s still working on the project). He also included a term that applies Occam’s razor which I didn’t completely understand, but it seems likely that that should have a fairly non-controversial effect.

The environments sound like they imitate individual questions on an IQ test, but with a much wider range of difficulties. We need a more complete description of the set of environments he uses in order to evaluate whether they’re heavily biased toward what Monte-Carlo AIXI does well or whether they closely resemble the environments an AI will find in the real world. He described two reasons for having some confidence in his set of environments: different subsets provided roughly similar results, and a human taking a small subset of the test found some environments easy, some very challenging, and some too hard to understand.

It sounds like with a few more months worth of effort, he could generate a series of results that show a trend in the AIQ of the best AI program in any given year, and also the AIQ of some smart humans (although he implied it would take a long time for a human to complete a test). That would give us some idea of whether AI workers have been making steady progress, and if so when the trend is likely to cross human AIQ levels. An educated guess about when AI will have a major impact on the world should help a bit in preparing for it.

A more disturbing possibility is that this test will be used as a fitness function for genetic programming. Given sufficient computing power, that looks likely to generate superhuman intelligence that is almost certainly unfriendly to humans. I’m confident that sufficient computing power is not available yet, but my confidence will decline over time.

Brian Wang has a few more notes on this talk