[I mostly wrote this to clarify my thoughts. I’m unclear whether this will be valuable for readers. ]
I expect that within a decade, AI will be able to do 90% of current human jobs. I don’t mean that 90% of humans will be obsolete. I mean that the average worker could delegate 90% of their tasks to an AGI.
I feel confused about what this implies for the kind of AI long-term planning and strategizing that would enable an AI to create large-scale harm if it is poorly aligned.
Is the ability to achieve long-term goals hard for an AI to develop?
By long-term, I’m referring to goals that require both long time horizons, and some ability to forecast the results of multiple steps of interventions.
Evidence from Evolution
Evolution provides some evidence that it’s hard.
It seems uncommon for most species to do anything that requires planning more than a few days in advance. The main examples that I can find of multi-month planning seem sufficiently specialized that they likely involve instincts that can’t be adapted to novel tasks: beavers constructing dams, and squirrels caching food.
Human success suggests there’s value in a more general ability to do long-term planning. So there was likely some selective pressure for it. The time it took for evolution to find human levels of planning suggests that it’s relatively hard.
Human infants have the ability to develop long-term planning abilities. It seems like they would benefit from having those planning abilities at birth, yet they take years to develop. According to ChatGPT:
Early Childhood (3-6 years): As children begin to develop better memory and the ability to project themselves into the future, there’s a budding understanding of time. However, their grasp of longer time periods is still immature. They might understand “tomorrow” but struggle with the concept of “next week” or “next month.”
Middle Childhood (7-10 years): During this phase, children’s understanding of time becomes more sophisticated, and they start to develop the ability to delay gratification and think ahead. For instance, they might save money to buy a desired toy or understand the idea of studying now to do well on a test later. However, their ability to plan for the long-term (e.g., months or years ahead) remains limited.
This evidence suggests that AIs might require longer training times, or more diverse interactions with the world, than I’d expect to be practical within 10 years.
Obstacles to Planning
I asked ChatGPT what obstacles there are to developing AIs that are capable of long-term planning. It’s answers included Temporal Credit Assignment, Complexity of the Environment, Exploitation vs. Exploration Dilemma, and Feedback Delays.
I’ll frame my answer differently: it’s hard to develop casual models that are sufficiently general-purpose to handle a wide variety of scenarios.
Will AI be Different?
Much knowledge can be acquired by observing correlations in a large dataset. Current AI training focuses almost exclusively on this.
In contrast, human childhood involves some active interventions on the child’s environment. I expect that to provide better evidence for constructing causal models.
That means that scaling up LLMs to roughly human levels will leave AIs with relatively weak abilities at causal modeling, and therefore relatively weak planning abilities.
However, I don’t expect AI progress to be exclusively scaling up of LLMs. Robotics seems likely to become important. Robots will have training that causes them to develop more sophisticated causal models than a comparably smart LLM.
Will robots be a separate branch of AI, or will they be integrated with LLM knowledge? I expect at least some integration, if only to make them easy to instruct via natural languages. I’m unclear whether there will be strong incentives to keep updating robots with the most powerful LLM-type knowledge.
Will robots be trained to have good causal models of humans? I can imagine that the answer is no, due to the difficulty of modeling humans and the relative simplicity of designing manufacturing plants to be robot-only environments. I have rather low confidence in that forecast.
How general-purpose will robot’s causal models become by default?
Best AI Planning So Far?
I looked for good examples of long-term planning in AIs.
OpenAI’s Minecraft playing system seems relatively impressive. It achieved roughly human-level performance at crafting the diamond pickaxe. Human experts typically need 20 minutes and 24,000 actions to accomplish that.
But how much planning did the AI learn independently? Less than the summary implies. The task requires collecting 11 other items in sequence. It looks like they trained the AI with rewards for each item, so at any one stage of training it was only finding out how to collect one novel item in an otherwise familiar sequence.
It still sounds impressive that they were able to do that, but that’s probably not close to what I’d call long-term planning. This research would have benefited from longer-term planning. Their failure to produce it is another small piece of evidence that long-term planning is hard.
Another Minecraft system, Voyager, plays Minecraft by writing blocks of code for each the tasks it wanted to perform. When performing a task that is composed of several subtasks, it can just reuse the functions it has already written to perform those subtasks. I see some impressive search and composition here, but not much planning.
If I stretch my imagination, I can see some chance that this approach will someday lead to human-level or better planning. But for now, it feels like AIs are planning at the level of a two year old human, versus being closer to a four year old at other reasoning abilities. I expect that relative maturity to continue for a while.
LeCun’s JEPA Model
Yann LeCun has a strategy for developing human-level planning, outlined in A Path Towards Autonomous Machine Intelligence:
Humans and many animals are able to conceive multilevel abstractions with which long-term predictions and long-term planning can be performed by decomposing complex actions into sequences of lower-level ones.
The capacity of JEPA to learn abstractions suggests an extension of the architecture to handle prediction at multiple time scales and multiple levels of abstraction. Intuitively, low-level representations contain a lot of details about the input, and can be used to predict in the short term. But it may be difficult to produce accurate long-term predictions with the same level of details. Conversely high-level, abstract representation may enable long-term predictions, but at the cost of eliminating a lot of details.
LeCun may well have one of the best approaches to human level long-term planning. If so, his belief that human level AI is a long way away constitutes some sort of evidence that planning will be slow to develop.
Conclusion
This kind of analysis has unavoidable uncertainties. There might be some simple tricks that make AI planning work better than human planning. But this analysis seems to be the best I can do.
I’m leaning toward expecting a nontrivial period in which AIs have mostly human-level abilities, but are too short-sighted for rogue AIs to be a major problem.
So I expect the most serious AI risks to be a few years further away than what I’d expect if I were predicting based on IQ-style tests.
I have moderate hopes for a period of more than a year in which AI assistants can contribute a fair amount to speeding up safety research.
I realize now that some of this post was influenced by a post that I’d forgotten reading: Causal confusion as an argument against the scaling hypothesis, which does a better job of explaining what I meant by causal modeling being hard.