Descriptions of AI-relevant ontological crises typically choose examples where it seems moderately obvious how humans would want to resolve the crises. I describe here a scenario where I don’t know how I would want to resolve the crisis.
I will incidentally ridicule express distate for some philosophical beliefs.
Suppose a powerful AI is programmed to have an ethical system with a version of the person-affecting view. A version which says only persons who exist are morally relevant, and “exist” only refers to the present time. [Note that the most sophisticated advocates of the person-affecting view are willing to treat future people as real, and only object to comparing those people to other possible futures where those people don’t exist.]
Suppose also that it is programmed by someone who thinks in Newtonian models. Then something happens which prevents the programmer from correcting any flaws in the AI. (For simplicity, I’ll say programmer dies, and the AI was programmed to only accept changes to its ethical system from the programmer).
What happens when the AI tries to make ethical decisions about people in distant galaxies (hereinafter “distant people”) using a model of the universe that works like relativity?
I’ll focus on a time in our future when people from earth have started colonizing other galaxies, but there’s still plenty of room for population growth. I’ll assume that any morally relevant life originated on earth.
It seems likely that any translation of the AI’s ethics into a relativistic representation of the world will imperfectly approximate the original representation. As the original paper on ontological crisis puts it:
An agent’s goal, or utility function, may also be specified in terms of the states of, or entities within, its ontology. If the agent may upgrade or replace its ontology, it faces a crisis: the agent’s original goal may not be well-defined with respect to its new ontology. This crisis must be resolved before the agent can make plans towards achieving its goals.
My first guess (which I’ll label REF) is that it will distinguish future from past using time coordinates for whatever reference frame it happens to live in.
That has the perverse result that, if different instances of the same AI have significantly different velocities, they may disagree by centuries about whether the time has come to treat distant people as morally relevant, without disagreeing about any facts. Distant people can even move in and out of moral relevance repeatedly as a function of the earth’s rotation.
That might have some unpleasant consequences if it influenced the AI’s decision to send a von Neumann probe to the distant galaxy. I don’t know how to model that scenario well, but the uncertainties bother me.
Here are some alternative ways the AI might resolve its confusion, depending on how the AI was created (imagine, say, it uses a half-assed version of CEV, or an approach that resembles some of Paul Christiano‘s ideas, probably including some reinforcement learning):
- (UTC) Pick a standard reference frame, comparable to Universal Coordinated Time, to use in deciding what constitutes the present.
- (PLC) Decide that only people who were born within the AI’s past light cone are morally relevant.
- (FLC) Decide that anyone who was born outside of the AI’s future light cone is morally relevant.
- (IND) Decide that there’s an objectively correct morality that’s incompatible with the person-affecting view, and that it requires the AI to treat the moral value of any person as independent of what time (past or future) that person was born.
- (NON) Decide that the person-affecting view wouldn’t accomplish whatever it was that motivated the programmer to adopt it, and guess as to what the programmer would have chosen if he had given up on the person-affecting view.
Option REF seems to be what the AI would choose if it were just trying to find the criteria that most resembled the criteria that the programmer intended.
The other options reflect an AI that cares about the goals which motivated the programmer.
UTC contains an additional concept which looks somewhat arbitrary. In fact, I’m confused as to how difficult it is to find a reference frame that doesn’t yield weird results. E.g. I can imagine that many reference frames eventually enter black holes, or otherwise have strange motion relative to distant galaxies.
PLC makes some sense if the person-affecting view is motivated by a notion that people aren’t real until we could in principle have observed them. However, I find it unlikely that this is the main motivation behind the person-affecting view.
Support for FLC seems to require some concerns related to whether we can affect whether the person will be born. I don’t see much reason to expect this to be popular. I suspect many advocates of the person-affecting view will dislike its implication that we must care about possibly astronomical numbers of descendants of distant people.
IND and NON reflect an unsettling lack of control over the AI’s ethics. I’m confused as to whether we should try to prevent them (obviously we should try to prevent them by thinking clearly about ethics, but I’m assuming we won’t do a perfect job of that in time for the first human-level AI’s).
I’d select NON as my prefered solution if the programmer would have abandoned the person-affecting view. But since I’m constructing this scenario to be deliberately hard, I’m going to assert that the programmer would have continued to reasonably believe that some of his goals would have been helped by choosing a different option.
Note that I haven’t claimed to have demonstrated that any of these options are wrong, only that they all seem somewhat counter-intuitive.
I put “exist” in quotes earlier in this post, because thinking in terms of relativity makes it feel mind-boggling to classify people as real or not real depending on some time coordinate associated with their birth. Yet I’m capable of switching back to a Newtonian model, in which it seems intuitively reasonable.
I think that philosophers have been discussing some related issues in the context of the A Theory and the B Theory of time. I don’t understand those debates very well, but it looks like relativity requires rejecting A Theory.
I don’t approve of the person-affecting view, but I can’t quite claim it’s wrong enough to justify rejecting it if it’s the part of humanity’s CEV.
I am unclear on some details of your scenario:
– Did the lawgiver (programmer) first program the AI (or, finally lose control of it) at a time when humans had already expanded to another galaxy? (It’s not realistic that such a programmer would fail to address relativity, but I’ll ignore that.)
– When you say that your scenario should assume
A version [of the “person-affecting view”] which says only persons who exist are morally relevant, and “exist” only refers to the present time
do you mean the (constant) present time when the programmer made this definition and/or lost control, or the (varying) present time at which the AI is applying this law?
(In the latter case, even in a Newtonian framework, the programmer has set things up so that human lives keep coming into and going out of moral relevance (when they are created or destroyed), but I guess they consider that a feature. I guess you mean that, since it makes your scenario difficult in simpler cases than if you didn’t mean that.)
==
If the definition of “what is morally relevant” can change with the velocity of some object (some instance of an AI, or some part of it doing some thinking), then that object ought to never change its velocity, since bad effects will come from perhaps-intentional changes to what it considers morally relevant.
(Even worse, what if the AI uses a distributed thinking process, and different parts of that process have different velocities?)
So any ethics in which a velocity change of the thinker would lead to a change in what it considers morally relevant, is at best a very bad idea, and at worse nonsensical.
The concept of “what’s in my future lightcone” doesn’t change with my velocity. It is also roughly equivalent to “what my present or future actions might possibly affect”, which is a natural limit on what it might be useful for me to consider morally relevant.
The concept of “what already exists” does change with my velocity, in a relativistic view, but not in a Newtonian view. So *we* might agree it’s nonsensical to use it in defining ethics, but your hypothetical programmer would not have thought that, and thus might have used that concept.
If they did use that concept, but if the AI interprets what they “said” with the knowledge that they had a Newtonian view (which seems inevitable, if what they “said” is expressed in some ontology that is based on that view), then clearly what they meant by “what already exists” should be interpreted in a Newtonian way (and in whichever frame of reference that original lawgiver would have considered as “the absolute frame of reference”).
(That removes some of the problem, but not all, eg if spacetime is significantly not flat, or if new observations suggest that what the original lawgiver would have considered “the absolute frame of reference” (when he made that law) to be different than he actually thought then. And guessing what frame of reference he originally treated as absolute might be itself ambiguous.)
This means I disagree with you that “Option REF seems to be what the AI would choose if it were just trying to find the criteria that most resembled the criteria that the programmer intended.” I think option UTC is that.
Your statement that you are “confused as to how difficult it is to find a reference frame that doesn’t yield weird results. E.g. I can imagine that many reference frames eventually enter black holes, …” doesn’t make complete sense to me — a reference frame is a not “worldline”, so I’m not sure what it means for it to “enter a black hole”. I think any reference frame (in special relativity or in a Newtonian view) is a mapping from *all* points of spacetime (or at least, as many of them as it is able to make sense for) into a tuple of space and time coordinates. So black holes are more likely to be localized defects in a reference frame, than something encompassing its entire future. A reference frame’s overall extent “in the present” would only be limited by large-scale curvature of space.
==
A lot depends on how this original law (given to the AI) was expressed, and on the more general question of how the AI is supposed to interpret an expressed law. (I realize that your main motivation for raising this scenario might be to help shed light on those issues.)
A more general question: if the AI stops understanding how to apply its law correctly (which might happen if the universe turns out to have a completely unanticipated structure — this scenario is a special case), should it continue to exert its full power towards applying some law which seems like a close approximation, or should it just stop exerting its power towards any goal? (Or, does it have, from the start, a succession of more and more general/vague laws, so that it can continue to use whichever most specific law still makes sense to it?)
==
You say you “don’t approve of the person-affecting view”. It seems to me your thoughts on this depend on a lot on what you imagine motivates that view. Probably you need to express those more clearly — or perhaps more usefully, express some related view that you *do* approve of (and therefore, we can hope, understand better).
If your goal is specifically a case study of “how should an AI reinterpret a nonsensical law”, then to decide what you want it to do, you at least have to understand (hypothetically in your scenario) the worldview and motivations of the lawgiver.
==
I think that any expression of an ethical law L (intended to guide behavior), within an ontology which sees the world as having type X, might as well take this general form:
“the world is roughly like X; find a world like X which explains your sensory inputs and which you think is affected by your actions; then, within that world, use law L (which is expressed in terms of that worldtype X) to guide your actions.”
(By “find a world like X” I really mean “find a way of interpreting all your inputs and actions as if they came from and affected a world like X”.)
In that framework, your problem reduces to how we express L and X, and how we find (and keep re-finding, as we learn more) a world like X in the “real world” (the world affecting the AI’s inputs and affected by its actions).
It seems to me your problem is mainly concerned with how to find a simpler world as a point of view in which to see and affect a more complex one. So the original lawgiver has to decide on some general strategy for doing that, or at least some criterion by which to judge such strategies. That will be needed for far more mundane issues than this one, since every L we normally express uses high-level concepts (eg “humans”) which need to be “found” in much lower-level sensory input and actions.
>- Did the lawgiver (programmer) first program the AI (or, finally lose control of it) at a time when humans had already expanded to another galaxy?
No, I’m assuming the AI is programmed in our near future, and becomes powerful for a long time.
> the (varying) present time at which the AI is applying this law?
Yes.
>This means I disagree with you that “Option REF seems to be what the AI would choose if it were just trying to find the criteria that most resembled the criteria that the programmer intended.” I think option UTC is that.
I guess I shouldn’t be surprised at disagreement, given that part of my point is that the programmer was confused in a way that I’m uncertain how to fix.
>a reference frame is a not “worldline”, so I’m not sure what it means for it to “enter a black hole”. I think any reference frame (in special relativity or in a Newtonian view) is a mapping from *all* points of spacetime (or at least, as many of them as it is able to make sense for) into a tuple of space and time coordinates. So black holes are more likely to be localized defects in a reference frame, than something encompassing its entire future. A reference frame’s overall extent “in the present” would only be limited by large-scale curvature of space.
I meant a standard that doesn’t try to force spacetime to be flat. But I’m unsure how to express precisely what I mean, so maybe I’m confused.
>A lot depends on how this original law (given to the AI) was expressed, and on the more general question of how the AI is supposed to interpret an expressed law. (I realize that your main motivation for raising this scenario might be to help shed light on those issues.)
Exactly.
>You say you “don’t approve of the person-affecting view”. It seems to me your thoughts on this depend on a lot on what you imagine motivates that view. Probably you need to express those more clearly — or perhaps more usefully, express some related view that you *do* approve of
I want to say that it’s generally better for a person to exist than to not exist; advocates of the person-affecting view want avoid that rule.
Some advocates of the person-affecting view also seem to want to avoid needing to account for potentially astronomical numbers of distant future people. I treat them as morally relevant (probably with some time-discounting). Some of this is related to disagreements about the Repugnant Conclusion.
Pingback: Rational Feed – deluks917