MIRI

All posts tagged MIRI

Book review: If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All, by Eliezer Yudkowsky, and Nate Soares.

[This review is written more than my usual posts with a Goodreads audience in mind. I will write a more LessWrong-oriented post with a more detailed description of the ways in which the book looks overconfident.]

If you’re not at least mildly worried about AI, Part 1 of this book is essential reading.

Please read If Anyone Builds It, Everyone Dies (IABIED) with Clarke’s First Law in mind (“When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.”). The authors are overconfident in dismissing certain safety strategies. But their warnings about what is possible ought to worry us.

I encourage you to (partly) judge the book by its cover: dark, implausibly certain of doom, and endorsed by a surprising set of national security professionals who had previously been very quiet about this topic. But only one Nobel Prize winner.

Will AI Be Powerful Soon?

The first part of IABIED focuses on what seems to be the most widespread source of disagreement: will AI soon become powerful enough to conquer us?

There are no clear obstacles to AIs becoming broadly capable of outsmarting us.

AI developers only know how to instill values that roughly approximate the values that they intend to instill.

Maybe the AIs will keep us as pets for a while, but they’ll have significant abilities to design entities that better satisfy what the AIs want from their pets. So unless we train the AIs such that we’re their perfect match for a pet, they may discard us for better models.

For much of Part 1, IABIED is taking dangers that experts mostly agree are real, and concluding that the dangers are much worse than most experts believe. IABIED’s arguments seem relatively weak when they’re most strongly disagreeing with more mainstream experts. But the book’s value doesn’t depend very much on the correctness of those weaker arguments, since merely reporting the beliefs of experts at AI companies would be enough for the book to qualify as alarmist.

I’m pretty sure that over half the reason why people are skeptical of claims such as IABIED makes is that people expect technology to be consistently overhyped.

It’s pretty understandable that a person who has not focused much attention on AI assumes it will work out like a typical technology.

An important lesson for becoming a superforecaster is to start from the assumption that nothing ever happens. I.e. that the future will mostly be like the past, and that a large fraction of claims that excite the news media turn out not to matter for forecasting, yet the media are trying to get your attention by persuading you that they do matter.

The heuristic that nothing ever happens has improved my ability to make money off the stock market, but the exceptions to that heuristic are still painful.

The most obvious example is COVID. I was led into complacency by a century of pandemics that caused less harm to the US than alarmists had led us to expect.

Another example involves hurricane warnings. The news media exaggerate the dangers of typical storms enough that when a storm such as Katrina comes along, viewers and newscasters alike find it hard to take accurate predictions seriously.

So while you should start with a pretty strong presumption that apocalyptic warnings are hype, it’s important to be able to change your mind about them.

What evidence is there that AI is exceptional enough that you should evaluate it carefully?

The most easy to understand piece of news is that Geoffrey Hinton, who won a Nobel Prize for helping AI get where it is today, worries that his life work was a mistake.

There’s lots of other evidence. IABIED points to many ways in which AI has exceeded human abilities as fairly good evidence of what might be possible for AI. Alas, there’s no simple analysis that tells us what’s likely.

If I were just starting to learn about AI, I’d feel pretty confused as to how urgent the topic is. But I’ve been following it for a long time. E.g. I wrote my master’s thesis in 1993 on neural nets, correctly predicting that they would form the foundation for AI. So you should consider my advice on this topic to be better than random. I’m telling you that something very important is happening.

How Soon?

I’m concerned that IABIED isn’t forceful enough about the “soon” part.

I’ve been convinced that AI will soon be powerful by a wide variety of measures of AI progress (e.g. these graphs, but also my informal estimates of how wide a variety of tasks it can handle). There are many trend lines that suggest AI will surpass humans in the early 2030s.

Others have tried the general approach of using such graphs to convince people, with unclear results. But this is one area where IABIED carefully avoids overconfidence.

Part 2 describes a detailed, somewhat plausible scenario of how an AI might defeat humanity. This part of the book shouldn’t be important, but probably some readers will get there and be surprised to realize that the authors really meant it when they said that AI will be powerful.

A few details of the scenario sound implausible. I agree with the basic idea that it would be unusually hard to defend against an AI attack. Yet it seems hard to describe a really convincing scenario.

A more realistic scenario would likely sound a good deal more mundane. I’d expect persuasion, blackmail, getting control of drone swarms, and a few other things like that. The ASI would combine them in ways that rely on evidence which is too complex to fit in a human mind. Including it in the book would have been futile, because skeptics wouldn’t come close to understanding why the strategy would work.

AI Company Beliefs

What parts of this book do leaders of AI companies disagree with? I’m fairly sure that they mostly agree that Part 1 of IABIED points to real risks. Yet they mostly reject the conclusion of the book’s title.

Eight years ago I wrote some speculations on roughly this topic. The main point that has changed since then is that believing “the risks are too distant” has become evidence that the researcher is working on a failed approach to AI.

This time I’ll focus mainly on the leaders of the four or so labs that have produced important AIs. They all seem to have admitted at some point that their strategies are a lot like playing Russian Roulette, for a decent shot at creating utopia.

What kind of person is able to become such a leader? It clearly requires both unusual competence and some recklessness.

I feel fairly confused as to whether they’ll become more cautious as their AIs become more powerful. I see a modest chance that they are accurately predicting which of their AIs will be too weak to cause a catastrophe, and that they will pivot before it’s too late. The stated plans of AI companies are not at all reassuring. Yet they likely understand the risks better than does anyone who might end up regulating AI.

Policies

I want to prepare for a possible shutdown of AI development circa 2027. That’s when my estimate of its political feasibility gets up to about 30%.

I don’t want a definite decision on a shutdown right now. I expect that AIs of 2027 will give us better advice than we have today as to whether a shutdown is wise, and how draconian it needs to be. (IABIED would likely claim that we can’t trust those AIs. That seems to reflect an important disagreement about how AI will work as it approaches human levels.)

Advantages of waiting a bit:

  • better AIs to help enforce the shutdown; in particular, better ability to reliably evaluate whether something violates the shutdown
  • better AIs to help decide how long the shutdown needs to last

I think I’m a bit more optimistic than IABIED about AI companies’ ability to judge whether their next version will be dangerously powerful.

I’m nervous about labeling IABIED’s proposal as a shutdown, when current enforcement abilities are rather questionable. It seems easier for AI research to evade restrictions than is the case with nuclear weapons. Developers who evade the law are likely to take less thoughtful risks than what we’re currently on track for.

I’m hoping that with AI support in 2027 it will be possible to regulate the most dangerous aspects of AI progress, while leaving some capability progress intact. Such as restricting research that increases AI agentiness, but not research that advances prediction ability. I see current trends as on track to produce superhuman predictions before it reaches superhuman steering abilities. AI companies could do more if they wanted to to increase the differences between those two categories (see Drexler’s CAIS for hints). And most of what we need for safety is superhuman predictions of which strategies have which risks (IABIED clearly disagrees with that claim).

IABIED thinks that the regulations they propose would delay ASI by decades. I’m unclear how confident they are about that prediction. It seems important to have doubts about how much of a delay is feasible.

A key component of their plan involves outlawing some AI research publications. That is a tricky topic, and their strategy is less clearly explained than I had hoped.

I’m reminded of a time in the late 20th century, when cryptography was regulated in a way that led to t-shirts describing the RSA algorithm being classified as a munition that could not be exported. Needless to say, that regulation was not very effective. This helps illustrate why restricting software innovation is harder than a casual reader would expect.

IABIED wants to outlaw the publication of papers such as the famous Attention Is All You Need paper that introduced the transformer algorithm. But that leaves me confused as to how broad a ban they hope for.

Possibly none of the ideas that need to be banned are quite simple enough to be readily described on a t-shirt, but I’m hesitant to bet on that. I will bet that would be hard for a regulator to recognize as relevant to AI. Matrix multiplication improvements are an example of a borderline case.

Low-level optimizations such as that could significantly influence how much compute is needed to create a dangerous AI.

In addition, smaller innovations, especially those that just became important recently, are somewhat likely to be reinvented by multiple people. So I expect that there are a nontrivial set of advances for which a ban on publication would delay progress for less than a year.

In sum, a decades-long shutdown might require more drastic measures than IABIED indicates.

The restriction on GPU access also needs some clarification. It’s currently fairly easy to figure out which chips matter. But with draconian long-term restrictions on anything that’s classified as a GPU, someone is likely to get creative about building powerful chips that don’t fit the GPU classification. It doesn’t seem too urgent to solve this problem, but it’s important not to forget it.

IABIED often sounds like its saying that a long shutdown is our only hope. I doubt they’d explicitly endorse that claim. But I can imagine that the book will nudge readers into that conclusion.

I’m more optimistic than IABIED about other strategies. I don’t expect we’ll need a genius to propose good solutions. I’m fairly convinced that the hardest part is distinguishing good, but still risky, solutions from bad ones when we see them.

There are more ideas than I have time to evaluate for making AI development safer. Don’t let IABIED talk you into giving up on all of them.

Conclusion

Will IABIED be good enough to save us? It doesn’t seem persuasive enough to directly change the minds of a large fraction of voters. But it’s apparently good enough that important national security people have treated it as a reason to go public with their concerns. IABIED may prove to be highly valuable by persuading a large set of people that they can express their existing concerns without being branded as weird.

We are not living in normal times. Ask your favorite AI what AI company leaders think of the book’s arguments. Look at relevant prediction markets, e.g.:

Continue Reading

A group of people from MIRI have published a mostly good introduction to the dangers of AI: The Problem. It is a step forward at improving the discussion of catastrophic risks from AI.

I agree with much of what MIRI writes there. I strongly agree with their near-term policy advice of prioritizing the creation of an off switch.

I somewhat disagree with their advice to halt (for a long time) progress toward ASI. We ought to make preparations in case a halt turns out to be important. But most of my hopes route through strategies that don’t need a halt.

A halt is both expensive and risky.

My biggest difference with MIRI is about how hard it is to adequately align an AI. Some related differences involve the idea of a pivotal act, and the expectation of a slippery slope between human-level AI and ASI.

Continue Reading

I’m having trouble keeping track of everything I’ve learned about AI and AI alignment in the past year or so. I’m writing this post in part to organize my thoughts, and to a lesser extent I’m hoping for feedback about what important new developments I’ve been neglecting. I’m sure that I haven’t noticed every development that I would consider important.

I’ve become a bit more optimistic about AI alignment in the past year or so.

I currently estimate a 7% chance AI will kill us all this century. That’s down from estimates that fluctuated from something like 10% to 40% over the past decade. (The extent to which those numbers fluctuate implies enough confusion that it only takes a little bit of evidence to move my estimate a lot.)

I’m also becoming more nervous about how close we are to human level and transformative AGI. Not to mention feeling uncomfortable that I still don’t have a clear understanding of what I mean when I say human level or transformative AGI.

Continue Reading

Book review: Human Compatible, by Stuart Russell.

Human Compatible provides an analysis of the long-term risks from artificial intelligence, by someone with a good deal more of the relevant prestige than any prior author on this subject.

What should I make of Russell? I skimmed his best-known book, Artificial Intelligence: A Modern Approach, and got the impression that it taught a bunch of ideas that were popular among academics, but which weren’t the focus of the people who were getting interesting AI results. So I guessed that people would be better off reading Deep Learning by Goodfellow, Bengio, and Courville instead. Human Compatible neither confirms nor dispels the impression that Russell is a bit too academic.

However, I now see that he was one of the pioneers of inverse reinforcement learning, which looks like a fairly significant advance that will likely become important someday (if it hasn’t already). So I’m inclined to treat him as a moderately good authority on AI.

The first half of the book is a somewhat historical view of AI, intended for readers who don’t know much about AI. It’s ok.

Continue Reading

Robin Hanson has been suggesting recently that we’ve been experiencing an AI boom that’s not too different from prior booms.

At the recent Foresight Vision Weekend, he predicted [not exactly – see the comments] a 20% decline in the number of Deepmind employees over the next year (Foresight asked all speakers to make a 1-year prediction).

I want to partly agree and partly disagree.

Continue Reading

Book review: The AI Does Not Hate You: Superintelligence, Rationality and the Race to Save the World, by Tom Chivers.

This book is a sympathetic portrayal of the rationalist movement by a quasi-outsider. It includes a well-organized explanation of why some people expect tha AI will create large risks sometime this century, written in simple language that is suitable for a broad audience.

Caveat: I know many of the people who are described in the book. I’ve had some sort of connection with the rationalist movement since before it became distinct from transhumanism, and I’ve been mostly an insider since 2012. I read this book mainly because I was interested in how the rationalist movement looks to outsiders.

Chivers is a science writer. I normally avoid books by science writers, due to an impression that they mostly focus on telling interesting stories, without developing a deep understanding of the topics they write about.

Chivers’ understanding of the rationalist movement doesn’t quite qualify as deep, but he was surprisingly careful to read a lot about the subject, and to write only things he did understand.

Many times I reacted to something he wrote with “that’s close, but not quite right”. Usually when I reacted that way, Chivers did a good job of describing the the rationalist message in question, and the main problem was either that rationalists haven’t figured out how to explain their ideas in a way that a board audience can understand, or that rationalists are confused. So the complaints I make in the rest of this review are at most weakly directed in Chivers direction.

I saw two areas where Chivers overlooked something important.

Rationality

One involves CFAR.

Chivers wrote seven chapters on biases, and how rationalists view them, ending with “the most important bias”: knowing about biases can make you more biased. (italics his).

I get the impression that Chivers is sweeping this problem under the rug (Do we fight that bias by being aware of it? Didn’t we just read that that doesn’t work?). That is roughly what happened with many people who learned rationalism solely via written descriptions.

Then much later, when describing how he handled his conflicting attitudes toward the risks from AI, he gives a really great description of maybe 3% of what CFAR teaches (internal double crux), much like a blind man giving a really clear description of the upper half of an elephant’s trunk. He prefaces this narrative with the apt warning: “I am aware that this all sounds a bit mystical and self-helpy. It’s not.”

Chivers doesn’t seem to connect this exercise with the goal of overcoming biases. Maybe he was too busy applying the technique on an important problem to notice the connection with his prior discussions of Bayes, biases, and sanity. It would be reasonable for him to argue that CFAR’s ideas have diverged enough to belong in a separate category, but he seems to put them in a different category by accident, without realizing that many of us consider CFAR to be an important continuation of rationalists’ interest in biases.

World conquest

Chivers comes very close to covering all of the layman-accessible claims that Yudkowsky and Bostrom make. My one complaint here is that he only give vague hints about why one bad AI can’t be stopped by other AI’s.

A key claim of many leading rationalists is that AI will have some winner take all dynamics that will lead to one AI having a decisive strategic advantage after it crosses some key threshold, such as human-level intelligence.

This is a controversial position that is somewhat connected to foom (fast takeoff), but which might be correct even without foom.

Utility functions

“If I stop caring about chess, that won’t help me win any chess games, now will it?” – That chapter title provides a good explanation of why a simple AI would continue caring about its most fundamental goals.

Is that also true of an AI with more complex, human-like goals? Chivers is partly successful at explaining how to apply the concept of a utility function to a human-like intelligence. Rationalists (or at least those who actively research AI safety) have a clear meaning here, at least as applied to agents that can be modeled mathematically. But when laymen try to apply that to humans, confusion abounds, due to the ease of conflating subgoals with ultimate goals.

Chivers tries to clarify, using the story of Odysseus and the Sirens, and claims that the Sirens would rewrite Odysseus’ utility function. I’m not sure how we can verify that the Sirens work that way, or whether they would merely persuade Odysseus to make false predictions about his expected utility. Chivers at least states clearly that the Sirens try to prevent Odysseus (by making him run aground) from doing what his pre-Siren utility function advises. Chivers’ point could be a bit clearer if he specified that in his (nonstandard?) version of the story, the Sirens make Odysseus want to run aground.

Philosophy

“Essentially, he [Yudkowsky] (and the Rationalists) are thoroughgoing utilitarians.” – That’s a bit misleading. Leading rationalists are predominantly consequentialists, but mostly avoid committing to a moral system as specific as utilitarianism. Leading rationalists also mostly endorse moral uncertainty. Rationalists mostly endorse utilitarian-style calculation (which entails some of the controversial features of utilitarianism), but are careful to combine that with worry about whether we’re optimizing the quantity that we want to optimize.

I also recommend Utilitarianism and its discontents as an example of one rationalist’s nuanced partial endorsement of utilitarianism.

Political solutions to AI risk?

Chivers describes Holden Karnofsky as wanting “to get governments and tech companies to sign treaties saying they’ll submit any AGI designs to outside scrutiny before switching them on. It wouldn’t be iron-clad, because firms might simply lie”.

Most rationalists seem pessimistic about treaties such as this.

Lying is hardly the only problem. This idea assumes that there will be a tiny number of attempts, each with a very small number of launches that look like the real thing, as happened with the first moon landing and the first atomic bomb. Yet the history of software development suggests it will be something more like hundreds of attempts that look like they might succeed. I wouldn’t be surprised if there are millions of times when an AI is turned on, and the developer has some hope that this time it will grow into a human-level AGI. There’s no way that a large number of designs will get sufficient outside scrutiny to be of much use.

And if a developer is trying new versions of their system once a day (e.g. making small changes to a number that controls, say, openness to new experience), any requirement to submit all new versions for outside scrutiny would cause large delays, creating large incentives to subvert the requirement.

So any realistic treaty would need provisions that identify a relatively small set of design choices that need to be scrutinized.

I see few signs that any experts are close to developing a consensus about what criteria would be appropriate here, and I expect that doing so would require a significant fraction of the total wisdom needed for AI safety. I discussed my hope for one such criterion in my review of Drexler’s Reframing Superintelligence paper.

Rationalist personalities

Chivers mentions several plausible explanations for what he labels the “semi-death of LessWrong”, the most obvious being that Eliezer Yudkowsky finished most of the blogging that he had wanted to do there. But I’m puzzled by one explanation that Chivers reports: “the attitude … of thinking they can rebuild everything”. Quoting Robin Hanson:

At Xanadu they had to do everything different: they had to organize their meetings differently and orient their screens differently and hire a different kind of manager, everything had to be different because they were creative types and full of themselves. And that’s the kind of people who started the Rationalists.

That seems like a partly apt explanation for the demise of the rationalist startups MetaMed and Arbital. But LessWrong mostly copied existing sites, such as Reddit, and was only ambitious in the sense that Eliezer was ambitious about what ideas to communicate.

Culture

I guess a book about rationalists can’t resist mentioning polyamory. “For instance, for a lot of people it would be difficult not to be jealous.” Yes, when I lived in a mostly monogamous culture, jealousy seemed pretty standard. That attititude melted away when the bay area cultures that I associated with started adopting polyamory or something similar (shortly before the rationalists became a culture). Jealousy has much more purpose if my partner is flirting with monogamous people than if he’s flirting with polyamorists.

Less dramatically, We all know people who are afraid of visiting their city centres because of terrorist attacks, but don’t think twice about driving to work.

This suggests some weird filter bubbles somewhere. I thought that fear of cities got forgotten within a month or so after 9/11. Is this a difference between London and the US? Am I out of touch with popular concerns? Does Chivers associate more with paranoid people than I do? I don’t see any obvious answer.

Conclusion

It would be really nice if Chivers and Yudkowsky could team up to write a book, but this book is a close substitute for such a collaboration.

See also Scott Aaronson’s review.

Eric Drexler has published a book-length paper on AI risk, describing an approach that he calls Comprehensive AI Services (CAIS).

His primary goal seems to be reframing AI risk discussions to use a rather different paradigm than the one that Nick Bostrom and Eliezer Yudkowsky have been promoting. (There isn’t yet any paradigm that’s widely accepted, so this isn’t a Kuhnian paradigm shift; it’s better characterized as an amorphous field that is struggling to establish its first paradigm). Dueling paradigms seems to be the best that the AI safety field can manage to achieve for now.

I’ll start by mentioning some important claims that Drexler doesn’t dispute:

  • an intelligence explosion might happen somewhat suddenly, in the fairly near future;
  • it’s hard to reliably align an AI’s values with human values;
  • recursive self-improvement, as imagined by Bostrom / Yudkowsky, would pose significant dangers.

Drexler likely disagrees about some of the claims made by Bostrom / Yudkowsky on those points, but he shares enough of their concerns about them that those disagreements don’t explain why Drexler approaches AI safety differently. (Drexler is more cautious than most writers about making any predictions concerning these three claims).

CAIS isn’t a full solution to AI risks. Instead, it’s better thought of as an attempt to reduce the risk of world conquest by the first AGI that reaches some threshold, preserve existing corrigibility somewhat past human-level AI, and postpone need for a permanent solution until we have more intelligence.

Continue Reading

Book review: Inadequate Equilibria, by Eliezer Yudkowsky.

This book (actually halfway between a book and a series of blog posts) attacks the goal of epistemic modesty, which I’ll loosely summarize as reluctance to believe that one knows better than the average person.

1.

The book starts by focusing on the base rate for high-status institutions having harmful incentive structures, charting a middle ground between the excessive respect for those institutions that we see in mainstream sources, and the cynicism of most outsiders.

There’s a weak sense in which this is arrogant, namely that if were obvious to the average voter how to improve on these problems, then I’d expect the problems to be fixed. So people who claim to detect such problems ought to have decent evidence that they’re above average in the relevant skills. There are plenty of people who can rationally decide that applies to them. (Eliezer doubts that advising the rest to be modest will help; I suspect there are useful approaches to instilling modesty in people who should be more modest, but it’s not easy). Also, below-average people rarely seem to be attracted to Eliezer’s writings.

Later parts of the book focus on more personal choices, such as choosing a career.

Some parts of the book seem designed to show off Eliezer’s lack of need for modesty – sometimes successfully, sometimes leaving me suspecting he should be more modest (usually in ways that are somewhat orthogonal to his main points; i.e. his complaints about “reference class tennis” suggest overconfidence in his understanding of his debate opponents).

2.

Eliezer goes a bit overboard in attacking the outside view. He starts with legitimate complaints about people misusing it to justify rejecting theory and adopt “blind empiricism” (a mistake that I’ve occasionally made). But he partly rejects the advice that Tetlock gives in Superforecasting. I’m pretty sure Tetlock knows more about this domain than Eliezer does.

E.g. Eliezer says “But in novel situations where causal mechanisms differ, the outside view fails—there may not be relevantly similar cases, or it may be ambiguous which similar-looking cases are the right ones to look at.”, but Tetlock says ‘Nothing is 100% “unique” … So superforecasters conduct creative searches for comparison classes even for seemingly unique events’.

Compare Eliezer’s “But in many contexts, the outside view simply can’t compete with a good theory” with Tetlock’s commandment number 3 (“Strike the right balance between inside and outside views”). Eliezer seems to treat the approaches as antagonistic, whereas Tetlock advises us to find a synthesis in which the approaches cooperate.

3.

Eliezer provides a decent outline of what causes excess modesty. He classifies the two main failure modes as anxious underconfidence, and status regulation. Anxious underconfidence definitely sounds like something I’ve felt somewhat often, and status regulation seems pretty plausible, but harder for me to detect.

Eliezer presents a clear model of why status regulation exists, but his explanation for anxious underconfidence doesn’t seem complete. Here are some of my ideas about possible causes of anxious underconfidence:

  • People evaluate mistaken career choices and social rejection as if they meant death (which was roughly true until quite recently), so extreme risk aversion made sense;
  • Inaction (or choosing the default action) minimizes blame. If I carefully consider an option, my choice says more about my future actions than if I neglect to think about the option;
  • People often evaluate their success at life by counting the number of correct and incorrect decisions, rather than adding up the value produced;
  • People who don’t grok the Bayesian meaning of the word “evidence” are likely to privilege the scientific and legal meanings of evidence. So beliefs based on more subjective evidence get treated as second class citizens.

I suspect that most harm from excess modesty (and also arrogance) happens in evolutionarily novel contexts. Decisions such as creating a business plan for a startup, or writing a novel that sells a million copies, are sufficiently different from what we evolved to do that we should expect over/underconfidence to cause more harm.

4.

Another way to summarize the book would be: don’t aim to overcompensate for overconfidence; instead, aim to eliminate the causes of overconfidence.

This book will be moderately popular among Eliezer’s fans, but it seems unlikely to greatly expand his influence.

It didn’t convince me that epistemic modesty is generally harmful, but it does provide clues to identifying significant domains in which epistemic modesty causes important harm.

Or, why I don’t fear the p-zombie apocalypse.

This post analyzes concerns about how evolution, in the absence of a powerful singleton, might, in the distant future, produce what Nick Bostrom calls a “Disneyland without children”. I.e. a future with many agents, whose existence we don’t value because they are missing some important human-like quality.

The most serious description of this concern is in Bostrom’s The Future of Human Evolution. Bostrom is cautious enough that it’s hard to disagree with anything he says.

Age of Em has prompted a batch of similar concerns. Scott Alexander at SlateStarCodex has one of the better discussions (see section IV of his review of Age of Em).

People sometimes sound like they want to use this worry as an excuse to oppose the age of em scenario, but it applies to just about any scenario with human-in-a-broad-sense actors. If uploading never happens, biological evolution could produce slower paths to the same problem(s) [1]. Even in the case of a singleton AI, the singleton will need to solve the tension between evolution and our desire to preserve our values, although in that scenario it’s more important to focus on how the singleton is designed.

These concerns often assume something like the age of em lasts forever. The scenario which Age of Em analyzes seems unstable, in that it’s likely to be altered by stranger-than-human intelligence. But concerns about evolution only depend on control being sufficiently decentralized that there’s doubt about whether a central government can strongly enforce rules. That situation seems sufficiently stable to be worth analyzing.

I’ll refer to this thing we care about as X (qualia? consciousness? fun?), but I expect people will disagree on what matters for quite some time. Some people will worry that X is lost in uploading, others will worry that some later optimization process will remove X from some future generation of ems.

I’ll first analyze scenarios in which X is a single feature (in the sense that it would be lost in a single step). Later, I’ll try to analyze the other extreme, where X is something that could be lost in millions of tiny steps. Neither extreme seems likely, but I expect that analyzing the extremes will illustrate the important principles.

Continue Reading

The paper When Will AI Exceed Human Performance? Evidence from AI Experts reports ML researchers expect AI will create a 5% chance of “Extremely bad (e.g. human extinction)” consequences, yet they’re quite divided over whether that implies it’s an important problem to work on.

Slate Star Codex expresses confusion about and/or disapproval of (a slightly different manifestation of) this apparent paradox. It’s a pretty clear sign that something is suboptimal.

Here are some conjectures (not designed to be at all mutually exclusive).
Continue Reading