Or, why I don’t fear the p-zombie apocalypse.
This post analyzes concerns about how evolution, in the absence of a powerful singleton, might, in the distant future, produce what Nick Bostrom calls a “Disneyland without children”. I.e. a future with many agents, whose existence we don’t value because they are missing some important human-like quality.
The most serious description of this concern is in Bostrom’s The Future of Human Evolution. Bostrom is cautious enough that it’s hard to disagree with anything he says.
People sometimes sound like they want to use this worry as an excuse to oppose the age of em scenario, but it applies to just about any scenario with human-in-a-broad-sense actors. If uploading never happens, biological evolution could produce slower paths to the same problem(s) . Even in the case of a singleton AI, the singleton will need to solve the tension between evolution and our desire to preserve our values, although in that scenario it’s more important to focus on how the singleton is designed.
These concerns often assume something like the age of em lasts forever. The scenario which Age of Em analyzes seems unstable, in that it’s likely to be altered by stranger-than-human intelligence. But concerns about evolution only depend on control being sufficiently decentralized that there’s doubt about whether a central government can strongly enforce rules. That situation seems sufficiently stable to be worth analyzing.
I’ll refer to this thing we care about as X (qualia? consciousness? fun?), but I expect people will disagree on what matters for quite some time. Some people will worry that X is lost in uploading, others will worry that some later optimization process will remove X from some future generation of ems.
I’ll first analyze scenarios in which X is a single feature (in the sense that it would be lost in a single step). Later, I’ll try to analyze the other extreme, where X is something that could be lost in millions of tiny steps. Neither extreme seems likely, but I expect that analyzing the extremes will illustrate the important principles.
Case A: A single, identifiable step where what we care about is lost.
Let’s imagine that we have somehow achieved widespread agreement about what X is, and how to observe whether a particular em has X. Let’s also assume that ems without X are more efficient at something relevant to running more copies of themselves, so that an em or biological human who focuses on (imperfect) self-replication (while neglecting X) will be tempted to make X-less copies of herself. That seems like a scenario that could produce a Disneyland filled with X-less beings.
Why would someone neglect X?
Maybe X is inherently unobservable. In which case we’ll never know whether we’ve lost it.
Maybe people only care about X when thinking abstractly. So they’ll lose interest in X when worries about X conflict with other goals. Just like people who are concerned about cameras stealing their souls tend to lose interest in their souls when cameras become useful to them. If people are that inconsistent about X, that seems to be evidence against the importance of X to them.
A more worrying scenario would involve X being hard to measure, but still measurable enough to be clearly real, such as happiness or life satisfaction. It seems possible that the introduction of farming, or the industrial revolution, impaired happiness or life satisfaction in a semi-permanent way. Preventing such change would have required unusual global coordination. I can imagine (although it seems somewhat unlikely) that it would have been wise for people to agree on stopping those revolutions.
The ratio of concerns about X to proposals to protect X suggests that most concern is more like the concern over cameras stealing our souls than like the agricultural revolution. It’s almost as if many people value confusion about X. (I get that impression mainly from casual conversations, not from sources that are worth citing).
What if opinions coalesce to several different tribes, each with a different X? Then an obvious scenario is that evolution will lead to one of those tribes dominating, unless people coordinate to stop evolution.
What research program would help us identify the critical change which removes X, and document it clearly enough that nobody mistakenly loses X?
- happiness/life satisfaction research? Self-reporting measures are not reliable enough on their own – those results need thoughtful cross-checking with peoples’ behavior. We need a fairly sophisticated model to tell us how to resolve conflicts between peoples’ reports and their behavior.
- research into CEV-like strategies?
- experiments on ems?
Here are the beginnings of some research that at least creates the appearance of being serious about the subject: http://www.openphilanthropy.org/2017-report-consciousness-and-moral-patienthood, and http://effective-altruism.com/ea/1cn/why_i_think_the_foundational_research_institute/.
Case B: Lots of little pieces of human-ness that we care about, which can be lost in little pieces.
If I give up one millionth of the features I care about, in order to double my amount of the remaining 999,999 features, I end up on a slippery slope. This is roughly Bostrom’s “Scenario 1: The Mindless Outsourcers”. I outsource my navigation to a gps today. Tomorrow I outsource my blog writing to an AI. Eventually it’s hard to say how much of me remains, and maybe it ends in the moral equivalent of galaxies tiled with smiley faces or giant lookup tables.
This is harder to deal with than case A, since I expect incentives in case B to push us in bad directions. So the rest of this post will focus on scenarios that are relatively close to this case.
Maybe it’s a little consolation that the millionth step in this process reduces to case A, at which point selfish incentives won’t cause much pressure for further harm.
Is coordination feasible?
In a world with no singleton or a weak singleton, there are limits to how much coordination we can hope for.
The main hopeful example where global coordination worked well is CFCs.
It’s easier to find inconclusive results: carbon emissions, nuclear proliferation, athlete steroids, etc.
Success seems to require widespread agreement on what result is valuable and on how to measure it, plus detection and enforcement costs being low compared to the benefits of defecting.
In an age of em, detecting small-scale violations of global agreements seems unpleasantly hard. Some authority would need the ability to inspect virtually any software. And understand that software well enough that I can’t mislead it about what short-cuts my software is taking.
If such a world can be achieved, it’s probably sustainable. But I expect that a transition to such a world would be fairly costly.
How important is it to detect small-scale violations?
Maybe we can allow some regions (say, a planet?) to have unconstrained software, and only need to check information that leaves such a region? If we need absolute guarantees, then the evidence from computer security seems discouraging.
Maybe we can allow 1 or 2 percent of the universe to be controlled by ems that violate global agreements, and can wait to suppress them until they become abundant. That may require some relatively expensive punishments, but that kind of approach has worked reasonably well for us historically. (E.g. most laws are designed to reduce crime to tolerable levels, not to guarantee zero crime: punishing drunk drivers works better than outlawing alcohol; most large companies have unproductive workers, but that doesn’t imply they’d benefit from zero-tolerance policies toward such workers).
Coordination to stop uploading would involve some very different problems. There would be large advantages to uploading oneself (or to owning/employing ems, if these ems are missing X), so I expect relatively many attempts at violating global agreements. Uploading seems likely to require some fairly specialized hardware, which will initially be hard to build and somewhat easy to regulate. But I’ll guess that it would be hard to permanently outlaw easy-to-build hardware that will enable uploading as a byproduct of other uses (think Drexlerian nanomedicine).
It isn’t clear what would cause a worldwide consensus among governments to reliably outlaw uploading. I find it easy to imagine voters asking most governments to outlaw uploading, but I’m skeptical that those voters would distinguish well-enforced laws from symbolic laws.
If governments fail to stop the first uploads, could they destroy em cities once they take off? I’m quite uncertain. Ems will have different vulnerabilities than biological humans, which does tend to suggest that biological humans would have some hope of victory. But before long ems will think much faster than biological humans, which will probably mean that few advantages of biological humans will remain for long. How well will ems be able to run away and/or hide?
1) disputes over X
A small number of people who don’t care about X make X-less copies of themselves. Most people oppose some aspect of that process, but a small number of people who don’t care about X could out-reproduce the others.
If the X-less copies don’t become especially powerful, it seems like just some people getting luxury goods that the majority doesn’t appreciate. I find it hard to see a problem with this scenario, unless it ends up becoming equivalent to the next scenario.
2) everyone agrees to value X, but X-less ems used for labor
It’s a bit more disturbing if X-less copies can perform most jobs at lower cost.
Much of this boils down to the standard problem of human jobs being replaced by machines.
This assumes people need to work in order to survive. If something like a UBI removes that need, there might still be reason for concern that machines would fill the universe.
If X-less ems only work under full control of their “owner”, it’s hard to see large problems.
If X-less ems go out to colonize galaxies, then there’s no obvious limit to what fraction of the universe is tiled with X-less ems.
Maybe I create X-less ems to quickly grab and hold a few galaxies until I can make enough genuine copies of myself to fill those galaxies. That’s a decent result if all goes well, but is very wasteful if I die (or otherwise lose control) before getting there and my X-less ems are able to keep those galaxies from being used by anyone else.
This seems mainly a problem when X-less copies are much more powerful than the originals (which may be the case for early ems).
Maybe we should worry about a “slave revolt” if they outnumber (or outsmart) ems with X? Possible solution: designing them to not be powerful enough, and/or using an approach that would satisfy MIRI.
3) Bostrom’s all-work-and-no-fun scenario, where we need to work nearly 100% of the time if we want our copies/descendants to be a significant fraction of future populations.
I’m not too concerned about this scenario, because I think minds can be designed to enjoy work (Turning the Repugnant Conclusion into Utopia) without losing much of what I value. But that attitude requires some unproven assumptions about X, and for the purposes of this post I want to minimize controversial assumptions about X.
One way in which the all-work-and-no-fun scenario can look worse than it is would be if we work hard until all local matter is turned into the most efficient feasible computronium, and we’ve sent out the optimal set of probes to colonize the rest of the universe. Then there’s no good reason to work more, and we can devote all our local resources to fun. I expect this scenario to produce a very good long-term result, although short-sighted people would dislike it.
So, why might Bostrom worry about this scenario?
One way it might go wrong is if we can’t afford the expense of storing the instructions needed to value fun. That seems like a tiny fraction of the cost of implementing intelligence, but tiny disadvantages can add up to important effects if they persist over sufficiently many cycles of reproduction (e.g. over the kind of population growth needed to colonize many galaxies). This is the kind of problem that we tend to underestimate, and could easily become serious if it were underestimated. But I’m relatively optimistic about global coordination being feasible when the costs of obeying global agreements are small.
Another way it could go wrong is if there’s a need for constant work to defend against hostile agents. My guess is that this will only happen at a few boundaries between very dissimilar civilizations. But I don’t have a compelling argument against it being more widespread.
What if typical regions of space can be improved indefinitely (e.g. maybe we’re mistaken about physics, and computational densities can go to infinity)? Then we could get a weird situation where the amount of fun goes to infinity while also becoming a negligible fraction of all activity. Infinite ethics seems like a hard topic, where I don’t expect to find good answers soon.
Have I captured Bostrom’s main worries?
How urgent is it?
Do we need to implement a full solution early in age of em? Or can we wait until “bad ems” reach, say, 1% of the population, then suppress them?
I suspect that depends on how costly X is.
If X is lost at uploading, then the difference in replication rates is big enough that I don’t expect them to be stopped after they reach 1% of human populations. So we’d need draconian enforcement mechanisms, and we’d need them fairly soon.
For an altered version of an existing em, I expect any increases in efficiency will be small enough that the fastest running ems will have plenty of time to react.
The age of em could be postponed somewhat by worldwide bans (or drastic restrictions?) on scanning hardware, or maybe by regulating big server farms. But those technologies seem destined to eventually become pretty easy for an ordinary person to make via their own 3D printer.
It seems quite possible that all most people will want is that small postponement of the problem, until AGI can provide a solution which involves some better form of coordination. It’s unclear whether it’s feasible to get enough governments to agree on this.
Since I expect the first uploads to have X, I’m not aware of a good reason to support such restrictions.
For changes beyond the initial uploading step, detecting violations will be tricky. See Age of Em (chapter 10, the section on surveillance) for interesting comments on why it might be feasible.
In sum, it would be pretty important to resolve concerns about uploading if there are sensible objections to it. I don’t see much urgency about the remaining issues, although the general issues surrounding global coordination seem important for other reasons (arms races?). Yet it’s likely to remain an important issue for us or our AI overlord(s) to deal with for quite some time.
This post ended up being longer and more ambitious than I originally imagined, and my intuition says there’s an unusually high probability that some part will look foolish in hindsight.
See also Paul Christiano’s Why might the future be good?. (“What natural selection selects for is patience.”)
 – I suspect much of the disutility of work, as work has been defined over the past few centuries or millennia, comes from a mismatch between the time horizons we’re evolved to focus on, versus the time horizons needed for a modern career. I’d be glad to alter my mind so that it felt more comfortable focusing on longer time horizons.