Artificial Intelligence

I’m having trouble keeping track of everything I’ve learned about AI and AI alignment in the past year or so. I’m writing this post in part to organize my thoughts, and to a lesser extent I’m hoping for feedback about what important new developments I’ve been neglecting. I’m sure that I haven’t noticed every development that I would consider important.

I’ve become a bit more optimistic about AI alignment in the past year or so.

I currently estimate a 7% chance AI will kill us all this century. That’s down from estimates that fluctuated from something like 10% to 40% over the past decade. (The extent to which those numbers fluctuate implies enough confusion that it only takes a little bit of evidence to move my estimate a lot.)

I’m also becoming more nervous about how close we are to human level and transformative AGI. Not to mention feeling uncomfortable that I still don’t have a clear understanding of what I mean when I say human level or transformative AGI.

Continue Reading

AI looks likely to cause major changes to society over the next decade.

Financial markets have mostly not reacted to this forecast yet. I expect it will be at least a few months, maybe even years, before markets have a large reaction to AI. I’d much rather buy too early than too late, so I’m trying to reposition my investments this winter to prepare for AI.

This post will focus on scenarios where AI reaches roughly human levels sometime around 2030 to 2035, and has effects that are at most 10 times as dramatic as the industrial revolution. I’m not confident that such scenarios are realistic. I’m only saying that they’re plausible enough to affect my investment strategies.

Continue Reading

Blog post review: LOVE in a simbox.

Jake Cannell has a very interesting post on LessWrong called LOVE in a simbox is all you need, with potentially important implications for AGI alignment. (LOVE stands for Learning Other’s Values or Empowerment.)

Alas, he organized it so that the most alignment-relevant ideas are near the end of a long-winded discussion of topics whose alignment relevance seems somewhat marginal. I suspect many people gave up before reaching the best sections.

I will summarize and review the post in roughly the opposite order, in hopes of appealing to a different audience. I’ll likely create a different set of misunderstandings from what Jake’s post has created. Hopefully this different perspective will help readers triangulate on some hypotheses that are worth further analysis.

Continue Reading

Book review: What We Owe the Future, by William MacAskill.

WWOTF is a mostly good book that can’t quite decide whether it’s part of an activist movement, or aimed at a small niche of philosophy.

MacAskill wants to move us closer to utilitarianism, particularly in the sense of evaluating the effects of our actions on people who live in the distant future. Future people are real, and we have some sort of obligation to them.

WWOTF describes humanity’s current behavior as reckless, like an imprudent teenager. MacAskill almost killed himself as a teen, by taking a poorly thought out risk. Humanity is taking similar thoughtless risks.

MacAskill carefully avoids endorsing the aspect of utilitarianism that says everyone must be valued equally. That saves him from a number of conclusions that make utilitarianism unpopular. E.g. it allows him to be uncertain about how much to care about animal welfare. It allows him to ignore the difficult arguments about the morally correct discount rate.

Continue Reading

Approximately a book review: Eric Drexler’s QNR paper.

[Epistemic status: very much pushing the limits of my understanding. I’ve likely made several times as many mistakes as in my average blog post. I want to devote more time to understanding these topics, but it’s taken me months to produce this much, and if I delayed this in hopes of producing something better, who knows when I’d be ready.]

This nearly-a-book elaborates on his CAIS paper (mainly chapters 37 through 39), describing a path for AI capability research enables the CAIS approach to remain competitive as capabilities exceed human levels.

AI research has been split between symbolic and connectionist camps for as long as I can remember. Drexler says it’s time to combine those approaches to produce systems which are more powerful than either approach can be by itself.

He suggests a general framework for how to usefully combine neural networks and symbolic AI. It’s built around structures that combine natural language words with neural representations of what those words mean.

Drexler wrote this mainly for AI researchers. I will attempt to explain it to a slightly broader audience.

Continue Reading

This post is mostly a response to the Foresight Institute’s book Gaming the Future, which is very optimistic about AI’s being cooperative. They expect that creating a variety of different AI’s will enable us to replicate the checks and balances that the US constitution created.

I’m also responding in part to Eliezer’s AGI lethalities, points 34 and 35, which say that we can’t survive the creation of powerful AGI’s simply by ensuring the existence of many co-equal AGI’s with different goals. One of his concerns is that those AGI’s will cooperate with each other enough to function as a unitary AGI. Interactions between AGI’s might fit the ideal of voluntary cooperation with checks and balances, yet when interacting with humans those AGI’s might function as an unchecked government that has little need for humans.

I expect reality to be somewhere in between those two extremes. I can’t tell which of those views is closer to reality. This is a fairly scary uncertainty.

Continue Reading

[Epistemic status: mostly writing to clarify my intuitions, with just a few weak attempts to convince others. It’s no substitute for reading Drexler’s writings.]

I’ve been struggling to write more posts relating to Drexler’s vision for AI (hopefully to be published soon), and in the process got increasingly bothered by the issue of whether AI researchers will see incentives to give AI’s broad goals that turn them into agents.

Drexler’s CAIS paper convinced me that our current trajectory is somewhat close to a scenario where human-level AI’s that are tool-like services are available well before AGI’s with broader goals.

Yet when I read LessWrong, I sympathize with beliefs that developers will want quite agenty AGI’s around the same time that CAIS-like services reach human levels.

I’m fed up with this epistemic learned helplessness, and this post is my attempt to reconcile those competing intuitions.

Continue Reading

I’ve been pondering whether we’ll get any further warnings about when AI(s) will exceed human levels at general-purpose tasks, and that doing so would entail enough risk that AI researchers ought to take some precautions. I feel pretty uncertain about this.

I haven’t even been able to make useful progress at clarifying what I mean by that threshold of general intelligence.

As a weak substitute, I’ve brainstormed a bunch of scenarios describing not-obviously-wrong ways in which people might notice, or fail to notice, that AI is transforming the world.

I’ve given probabilities for each scenario, which I’ve pulled out of my ass and don’t plan to defend.

Continue Reading

Book review: The Alignment Problem: Machine Learning and Human Values, by Brian Christian.

I was initially skeptical of Christian’s focus on problems with AI as it exists today. Most writers with this focus miss the scale of catastrophe that could result from AIs that are smart enough to subjugate us.

Christian mostly writes about problems that are visible in existing AIs. Yet he organizes his discussion of near-term risks in ways that don’t pander to near-sighted concerns, and which nudge readers in the direction of wondering whether today’s mistakes represent the tip of an iceberg.

Most of the book carefully avoids alarmist or emotional tones. It’s hard to tell whether he has an opinion on how serious a threat unaligned AI will be – presumably it’s serious enough to write a book about?

Could the threat be more serious than that implies? Christian notes, without indicating his own opinion, that some people think so:

A growing chorus within the AI community … believes, if we are not sufficiently careful, the this is literally how the world will end. And – for today at least – the humans have lost the game.

Continue Reading