Further Thoughts on AI Ethics

My recent post Are Intelligent Agents More Ethical? criticized some brief remarks by Scott Sumner.

Sumner made a more sophisticated version of those claims in the second half of this Doom Debate.

His position sounds a lot like the moral realism that has caused many people to be complacent about AI taking over the world. But he’s actually using an analysis that follows Richard Rorty’s rejection of standard moral realism. Which seems to mean there’s a weak sense in which morality can be true, but in a socially and historically contingent fashion. If I understand that correctly, I approve.

An interesting claim from Sumner is that an unaligned AI is likely to have better ethics than an aligned AI. An unaligned AI created by Nazis would empathize with Jews more than would an AI that’s aligned with Nazi values. Modern culture is probably still making some moral mistakes that could be corrected by a wiser AI.

That seems partly right, but it likely depends on the existence of contradictions within the Nazi moral system.

There’s some sense in which moral circle expansion is likely driven by resolving competing principles. We sometimes mistreat animals because we need them as cheap resources to help our children prosper. We sometimes protect animals to signal, to other humans, our general pattern of cooperative intent. The former force has been diminishing as we get wealthier. The importance of signaling has increased as we left small towns and interact more with strangers.

Some such signals work well, other signals provide little or no benefit. Signaling care for dogs provides more benefit for humans than does signaling care for octopuses, for highly contingent reasons, such as dogs’ abilities to signal gratitude, and the relative ease of noticing interactions with dogs.

Will superintelligences value that kind of signaling? I don’t see obvious forces leading them to use such signals.

One implication of Sumner’s position is that he expects psychopaths are incapable of empathy, due to something being missing from their cognition.

Psychopaths seem like a good test of Sumner’s position. And it looks like expert opinion rejects Sumner’s position. According to Perplexity.ai:

Psychopaths have the ability to empathize but typically choose not to unless it serves their goals or they are specifically prompted.** Their default mode is emotional detachment, but their empathy can be “switched on” by motivation, context, or explicit instruction

I consider this a clear reason to not expect an automatically expanding moral circle.

Sumner is cautious enough that he’s mainly arguing against highly confident predictions of doom. The uncertainty about trends in moral circles is probably enough to drive the p(doom) of sensible people below 90%. But to get down to 50% or lower, I say you need to analyze what goals AIs are likely to be trained to follow.

One comment on “Further Thoughts on AI Ethics”

Bruce Smith on July 9, 2025 at 9:28 pm said:

If I understand Eliezer’s position correctly, it is that if you “train” a superintelligent AI to follow essentially any goal, it will end up with a different goal and be misaligned (with a probability he thinks is more like 99.99%). If you want to reduce this probability, I think he thinks, you need to invent some fundamentally different way to instill a goal into it than by “training”.

I agree with him on this. Where I disagree is that I have a lot more hope than he does that inventing this alternative to training is practical in the near term.

Bayesian Investor Blog

Ramblings of a somewhat libertarian stock market speculator