Artificial Intelligence

AI 2027 portrays two well thought out scenarios for how AI is likely to impact the world toward the end of this decade.

I expect those scenarios will prove to be moderately wrong, but close enough to be scary. I also expect that few people will manage to make forecasts that are significantly more accurate.

Here are some scattered thoughts that came to mind while I read AI 2027.

The authors are fairly pessimistic. I see four key areas where their assumptions seem to lead them to see more danger than do more mainstream experts. They see:

  • a relatively small capabilities lead being enough for a group to conquer the world
  • more difficulty of alignment
  • more difficulty of detecting deception
  • AI companies being less careful than is necessary

I expect that the authors are being appropriately concerned on about two of these assumptions, and a bit too pessimistic on the others. I’m hesitant to bet on which assumptions belong in which category.

They don’t focus much on justifying those assumptions. That’s likely wise, since prior debates on those topics have not been very productive. Instead, they’ve focused more on when various changes will happen.

This post will focus on aspects of the first two assumptions for which I expect further analysis to be relatively valuable.

Continue Reading

I have an idea for a fairly cheap strategy for slightly reducing the risk of catastrophic harm from AI.

This post was inspired by a recent LessWrong post How AI Takeover Might Happen in 2 Years.

That post describes a scenario in which most humans die, mainly because an AI that is first to become powerful enough to attempt world conquest sees risks that other AIs, imperfectly aligned with it, will cause danger soon. This creates a perceived need to engage in a few violent pivotal processes, rather than using a combination of persuasion and negotiation.

The specific scenario in that post seems fairly unrealistic. I see less than a 5% chance that a leading AI within the next 10 years will want to attempt that kind of world conquest. But it’s close enough to being realistic that I want to analyze a class of scenarios that are similar to it.

I want to nudge AIs in the direction of being more willing to cooperate with each other, in order to reduce the risk of such a conquest. Specifically, I want them to negotiate a peaceful outcome instead of using pre-emptive destruction.

Continue Reading

I’ve been creating prediction markets on Manifold in order to better predict AI strategies. Please trade them.

If I get a bit more trading in these markets, I will create more AI-related markets. Stay tuned here, or follow me on Manifold.

In 2015, I posted some investing advice for people who only spend a few hours per year on investing.

I intended to review it after five years, but a pandemic distracted me. It looks like this whole decade will end up being too busy for me to write everything that I want to write. But I’ve become able to write faster recently, maybe due to the feeling of urgency about AI transforming the world soon. So I’m getting a few old ideas for blog posts off of my to-do list, in order to be able to devote most of my attention to AI when the world becomes wild.

My advice worked poorly enough that I’m too discouraged to quantify the results.

Continue Reading

Book review: Uncontrollable: The Threat of Artificial Superintelligence and the Race to Save the World, by Darren McKee.

This is by far the best introduction to AI risk for people who know little about AI. It’s appropriate for a broader class of readers than most laymen-oriented books.

It was published 14 months ago. In this rapidly changing field, most AI books say something that gets discredited by the time they’re that old. I found no clear example of such obsolescence in Uncontrollable (but read on for a set of controversial examples).

Nearly everything in the book was familiar to me, yet the book prompted me to reflect better, thereby changing my mind modestly – mostly re-examining issues that I’ve been neglecting for the past few years, in light of new evidence.

The rest of this review will focus on complaints, mostly about McKee’s overconfidence. The features that I complain about reduce the value of book by maybe 10% compared to the value of an ideal book. But that ideal book doesn’t exist, and I’m not wise enough to write it.

Continue Reading

Book review: Genesis: Artificial Intelligence, Hope, and the Human Spirit, by Henry A. Kissinger, Eric Schmidt, and Craig Mundie.

Genesis lends a bit of authority to concerns about AI.

It is a frustrating book. It took more effort for me read than it should have taken. The difficulty stems not from complex subject matter (although the topics are complex), but from a peculiarly alien writing style that transcends mere linguistic differences – though Kissinger’s German intellectual heritage may play a role.

The book’s opening meanders through historical vignettes whose relevance remains opaque, testing my patience before finally addressing AI.

Continue Reading

TL;DR:

  • Corrigibility is a simple and natural enough concept that a prosaic AGI can likely be trained to obey it.
  • AI labs are on track to give superhuman(?) AIs goals which conflict with corrigibility.
  • Corrigibility fails if AIs that have goals which conflict with corrigibility.
  • AI labs are not on track to find a safe alternative to corrigibility.

This post is mostly an attempt to distill and rewrite Max Harm’s Corrigibility As Singular Target Sequence so that a wider audience understands the key points. I’ll start by mostly explaining Max’s claims, then drift toward adding some opinions of my own.

Continue Reading

Two months ago I attended Eric Drexler’s launch of MSEP.one. It’s open source software, written by people with professional game design experience, intended to catalyze better designs for atomically precise manufacturing (or generative nanotechnology, as he now calls it).

Drexler wants to draw more attention to the benefits of nanotech, which involve large enough exponents that our intuition boggles at handling them. That includes permanent health (Drexler’s new framing of life extension and cures for aging).

He hopes that a decentralized network of users will create a rich library of open-source components that might be used to build a nanotech factory. With enough effort, it could then become possible to design a complete enough factory that critics would have to shift from their current practice of claiming nanotech is impossible, to arguing with expert chemists over how well it would work.

Continue Reading