existential risks

All posts tagged existential risks

Book review: The Coming Wave: Technology, Power, and the Twenty-first Century’s Greatest Dilemma, by Mustafa Suleyman.

An author with substantial AI expertise has attempted to discuss AI in terms that the average book reader can understand.

The key message: AI is about to become possibly the most important event in human history.

Maybe 2% of readers will change their minds as a result of reading the book.

A large fraction of readers will come in expecting the book to be mostly hype. They won’t look closely enough to see why Suleyman is excited.

Continue Reading

Context: looking for an alternative to a pause on AI development.

There’s some popular desire for software decisions to be explainable when used for decisions such as whether to grant someone a loan. That desire is not sufficient reason for possibly crippling AI progress. But in combination with other concerns about AI, it seems promising.

Much of this popular desire likely comes from people who have been (or expect to be) denied loans, and who want to scapegoat someone or something to avoid admitting that they look unsafe to lend to because they’ve made poor decisions. I normally want to avoid regulations that are supported by such motives.

Yet an explainability requirement shows some promise at reducing the risks from rogue AIs.

Continue Reading

Robin Hanson suggests, partly in response to calls for a pause in development of AGI, liability rules for risks related to AGI rapidly becoming powerful.

My intuitive reaction was to classify foom liability as equivalent to a near total ban on AGI.

Now that I’ve found time to think more carefully about it, I want to advocate foom liability as a modest improvement over any likely pause or ban on AGI research. In particular, I want the most ambitious AI labs worldwide to be required to have insurance against something like $10 billion to $100 billion worth of damages.

Continue Reading

I previously said:

I see little hope of a good agreement to pause AI development unless leading AI researchers agree that a pause is needed, and help write the rules. Even with that kind of expert help, there’s a large risk that the rules will be ineffective and cause arbitrary collateral damage.

Yoshua Bengio has a reputation that makes him one of the best people to turn to for such guidance. He has now suggested restrictions on AI development that are targeted specifically at agenty AI.

If turned into a clear guideline, that would be a much more desirable method of slowing the development of dangerous AI. Alas, Bengio seems to admit that he isn’t yet able to provide that clarity.

Continue Reading

I’m having trouble keeping track of everything I’ve learned about AI and AI alignment in the past year or so. I’m writing this post in part to organize my thoughts, and to a lesser extent I’m hoping for feedback about what important new developments I’ve been neglecting. I’m sure that I haven’t noticed every development that I would consider important.

I’ve become a bit more optimistic about AI alignment in the past year or so.

I currently estimate a 7% chance AI will kill us all this century. That’s down from estimates that fluctuated from something like 10% to 40% over the past decade. (The extent to which those numbers fluctuate implies enough confusion that it only takes a little bit of evidence to move my estimate a lot.)

I’m also becoming more nervous about how close we are to human level and transformative AGI. Not to mention feeling uncomfortable that I still don’t have a clear understanding of what I mean when I say human level or transformative AGI.

Continue Reading

Blog post review: LOVE in a simbox.

Jake Cannell has a very interesting post on LessWrong called LOVE in a simbox is all you need, with potentially important implications for AGI alignment. (LOVE stands for Learning Other’s Values or Empowerment.)

Alas, he organized it so that the most alignment-relevant ideas are near the end of a long-winded discussion of topics whose alignment relevance seems somewhat marginal. I suspect many people gave up before reaching the best sections.

I will summarize and review the post in roughly the opposite order, in hopes of appealing to a different audience. I’ll likely create a different set of misunderstandings from what Jake’s post has created. Hopefully this different perspective will help readers triangulate on some hypotheses that are worth further analysis.

Continue Reading