Descriptions of AI-relevant ontological crises typically choose examples where it seems moderately obvious how humans would want to resolve the crises. I describe here a scenario where I don’t know how I would want to resolve the crisis.
I will incidentally
ridicule express distate for some philosophical beliefs.
Suppose a powerful AI is programmed to have an ethical system with a version of the person-affecting view. A version which says only persons who exist are morally relevant, and “exist” only refers to the present time. [Note that the most sophisticated advocates of the person-affecting view are willing to treat future people as real, and only object to comparing those people to other possible futures where those people don’t exist.]
Suppose also that it is programmed by someone who thinks in Newtonian models. Then something happens which prevents the programmer from correcting any flaws in the AI. (For simplicity, I’ll say programmer dies, and the AI was programmed to only accept changes to its ethical system from the programmer).
What happens when the AI tries to make ethical decisions about people in distant galaxies (hereinafter “distant people”) using a model of the universe that works like relativity?
I’ll focus on a time in our future when people from earth have started colonizing other galaxies, but there’s still plenty of room for population growth. I’ll assume that any morally relevant life originated on earth.
It seems likely that any translation of the AI’s ethics into a relativistic representation of the world will imperfectly approximate the original representation. As the original paper on ontological crisis puts it:
An agent’s goal, or utility function, may also be specified in terms of the states of, or entities within, its ontology. If the agent may upgrade or replace its ontology, it faces a crisis: the agent’s original goal may not be well-defined with respect to its new ontology. This crisis must be resolved before the agent can make plans towards achieving its goals.
My first guess (which I’ll label REF) is that it will distinguish future from past using time coordinates for whatever reference frame it happens to live in.
That has the perverse result that, if different instances of the same AI have significantly different velocities, they may disagree by centuries about whether the time has come to treat distant people as morally relevant, without disagreeing about any facts. Distant people can even move in and out of moral relevance repeatedly as a function of the earth’s rotation.
That might have some unpleasant consequences if it influenced the AI’s decision to send a von Neumann probe to the distant galaxy. I don’t know how to model that scenario well, but the uncertainties bother me.
Here are some alternative ways the AI might resolve its confusion, depending on how the AI was created (imagine, say, it uses a half-assed version of CEV, or an approach that resembles some of Paul Christiano‘s ideas, probably including some reinforcement learning):
- (UTC) Pick a standard reference frame, comparable to Universal Coordinated Time, to use in deciding what constitutes the present.
- (PLC) Decide that only people who were born within the AI’s past light cone are morally relevant.
- (FLC) Decide that anyone who was born outside of the AI’s future light cone is morally relevant.
- (IND) Decide that there’s an objectively correct morality that’s incompatible with the person-affecting view, and that it requires the AI to treat the moral value of any person as independent of what time (past or future) that person was born.
- (NON) Decide that the person-affecting view wouldn’t accomplish whatever it was that motivated the programmer to adopt it, and guess as to what the programmer would have chosen if he had given up on the person-affecting view.
Option REF seems to be what the AI would choose if it were just trying to find the criteria that most resembled the criteria that the programmer intended.
The other options reflect an AI that cares about the goals which motivated the programmer.
UTC contains an additional concept which looks somewhat arbitrary. In fact, I’m confused as to how difficult it is to find a reference frame that doesn’t yield weird results. E.g. I can imagine that many reference frames eventually enter black holes, or otherwise have strange motion relative to distant galaxies.
PLC makes some sense if the person-affecting view is motivated by a notion that people aren’t real until we could in principle have observed them. However, I find it unlikely that this is the main motivation behind the person-affecting view.
Support for FLC seems to require some concerns related to whether we can affect whether the person will be born. I don’t see much reason to expect this to be popular. I suspect many advocates of the person-affecting view will dislike its implication that we must care about possibly astronomical numbers of descendants of distant people.
IND and NON reflect an unsettling lack of control over the AI’s ethics. I’m confused as to whether we should try to prevent them (obviously we should try to prevent them by thinking clearly about ethics, but I’m assuming we won’t do a perfect job of that in time for the first human-level AI’s).
I’d select NON as my prefered solution if the programmer would have abandoned the person-affecting view. But since I’m constructing this scenario to be deliberately hard, I’m going to assert that the programmer would have continued to reasonably believe that some of his goals would have been helped by choosing a different option.
Note that I haven’t claimed to have demonstrated that any of these options are wrong, only that they all seem somewhat counter-intuitive.
I put “exist” in quotes earlier in this post, because thinking in terms of relativity makes it feel mind-boggling to classify people as real or not real depending on some time coordinate associated with their birth. Yet I’m capable of switching back to a Newtonian model, in which it seems intuitively reasonable.
I think that philosophers have been discussing some related issues in the context of the A Theory and the B Theory of time. I don’t understand those debates very well, but it looks like relativity requires rejecting A Theory.
I don’t approve of the person-affecting view, but I can’t quite claim it’s wrong enough to justify rejecting it if it’s the part of humanity’s CEV.