Context: looking for an alternative to a pause on AI development.
There’s some popular desire for software decisions to be explainable when used for decisions such as whether to grant someone a loan. That desire is not sufficient reason for possibly crippling AI progress. But in combination with other concerns about AI, it seems promising.
Much of this popular desire likely comes from people who have been (or expect to be) denied loans, and who want to scapegoat someone or something to avoid admitting that they look unsafe to lend to because they’ve made poor decisions. I normally want to avoid regulations that are supported by such motives.
Yet an explainability requirement shows some promise at reducing the risks from rogue AIs.
I’m wondering how selection effects will influence the first serious attempt by an AGI to take over the world.
My question here is inspired by thoughts about people who say AGI couldn’t conquer the world because it will depend on humans to provide electricity, semiconductors, etc.
I participated last summer in Tetlock’s Existential Risk Persuasion Tournament (755(!) page paper here).
Superforecasters and “subject matter experts” engaged in a hybrid between a prediction market and debates, to predict catastrophic and existential risks this century.
Robin Hanson suggests, partly in response to calls for a pause in development of AGI, liability rules for risks related to AGI rapidly becoming powerful.
My intuitive reaction was to classify foom liability as equivalent to a near total ban on AGI.
Now that I’ve found time to think more carefully about it, I want to advocate foom liability as a modest improvement over any likely pause or ban on AGI research. In particular, I want the most ambitious AI labs worldwide to be required to have insurance against something like $10 billion to $100 billion worth of damages.
I previously said:
I see little hope of a good agreement to pause AI development unless leading AI researchers agree that a pause is needed, and help write the rules. Even with that kind of expert help, there’s a large risk that the rules will be ineffective and cause arbitrary collateral damage.
Yoshua Bengio has a reputation that makes him one of the best people to turn to for such guidance. He has now suggested restrictions on AI development that are targeted specifically at agenty AI.
If turned into a clear guideline, that would be a much more desirable method of slowing the development of dangerous AI. Alas, Bengio seems to admit that he isn’t yet able to provide that clarity.
OpenAI has told us in some detail what they’ve done to make GPT-4 safe.
This post will complain about some misguided aspects of OpenAI’s goals.
I like the basic idea of a pause in training increasingly powerful AIs. Yet I’m quite dissatisfied with any specific plan that I can think of.
AI research is proceeding at a reckless pace. There’s massive disagreement among intelligent people as to how dangerous this is.
I’m having trouble keeping track of everything I’ve learned about AI and AI alignment in the past year or so. I’m writing this post in part to organize my thoughts, and to a lesser extent I’m hoping for feedback about what important new developments I’ve been neglecting. I’m sure that I haven’t noticed every development that I would consider important.
I’ve become a bit more optimistic about AI alignment in the past year or so.
I currently estimate a 7% chance AI will kill us all this century. That’s down from estimates that fluctuated from something like 10% to 40% over the past decade. (The extent to which those numbers fluctuate implies enough confusion that it only takes a little bit of evidence to move my estimate a lot.)
I’m also becoming more nervous about how close we are to human level and transformative AGI. Not to mention feeling uncomfortable that I still don’t have a clear understanding of what I mean when I say human level or transformative AGI.
Blog post review: LOVE in a simbox.
Jake Cannell has a very interesting post on LessWrong called LOVE in a simbox is all you need, with potentially important implications for AGI alignment. (LOVE stands for Learning Other’s Values or Empowerment.)
Alas, he organized it so that the most alignment-relevant ideas are near the end of a long-winded discussion of topics whose alignment relevance seems somewhat marginal. I suspect many people gave up before reaching the best sections.
I will summarize and review the post in roughly the opposite order, in hopes of appealing to a different audience. I’ll likely create a different set of misunderstandings from what Jake’s post has created. Hopefully this different perspective will help readers triangulate on some hypotheses that are worth further analysis.
Book review: What We Owe the Future, by William MacAskill.
WWOTF is a mostly good book that can’t quite decide whether it’s part of an activist movement, or aimed at a small niche of philosophy.
MacAskill wants to move us closer to utilitarianism, particularly in the sense of evaluating the effects of our actions on people who live in the distant future. Future people are real, and we have some sort of obligation to them.
WWOTF describes humanity’s current behavior as reckless, like an imprudent teenager. MacAskill almost killed himself as a teen, by taking a poorly thought out risk. Humanity is taking similar thoughtless risks.
MacAskill carefully avoids endorsing the aspect of utilitarianism that says everyone must be valued equally. That saves him from a number of conclusions that make utilitarianism unpopular. E.g. it allows him to be uncertain about how much to care about animal welfare. It allows him to ignore the difficult arguments about the morally correct discount rate.