Book review: Surfing Uncertainty: Prediction, Action, and the Embodied Mind, by Andy Clark.
Surfing Uncertainty describes minds as hierarchies of prediction engines. Most behavior involves interactions between a stream of information that uses low-level sensory data to adjust higher level predictive models of the world, and another stream of data coming from high-level models that guides low-level sensory processes to better guess the most likely interpretations of ambiguous sensory evidence.
Clark calls this a predictive processing (PP) model; others refer to is as predictive coding.
The book is full of good ideas, presented in a style that sapped my curiosity.
Jeff Hawkins has a more eloquent book about PP (On Intelligence), which focuses on how PP might be used to create artificial intelligence. The underwhelming progress of the company Hawkins started to capitalize on these ideas suggests it wasn’t the breakthrough that AI researchers were groping for. In contrast, Clark focuses on how PP helps us understand existing minds.
The PP model clearly has some value. The book was a bit more thorough than I wanted at demonstrating that. Since I didn’t find that particularly new or surprising, I’ll focus most of this review on a few loose threads that the book left dangling. So don’t treat this as a summary of the book (see Slate Star Codex if you want that, or if my review is too cryptic to understand), but rather as an exploration of the questions that the book provoked me to think about.
Attention
Clark says a good deal about how attention works. It involves high gain[1] for pathways with the most reliable (in expectation) evidence of relevant prediction errors. “But attention is not, PP suggests, itself a mechanism so much as a dimension of a much more fundamental resource.”
That clarifies a modest portion of “attention” for me, but I have little confidence that Clark or I have summarized it well enough that you can get much out of it short of wading through most of the book.
This model explains why meditation is hard: I have to convince myself that evidence about my breathing (or whatever I focus on) will generate good evidence that I made a mistaken prediction. Yet it’s fairly hard to make a faulty prediction about how my breathing will function over the next second or two. Each layer in the hierarchy will mostly just observe the expected evidence about breathing, which means there’s no need to signal any surprises to other parts of the hierarchy, and only the lowest levels in the hierarchy have any error signal to attend to. So the high-level parts of the mind will by default divert attention toward some unrelated context where it might find some important error signals (Did I turn the stove off? Do I have time to respond to that email? Is Trump causing civilization to collapse?).
Autism, ADHD, Schizophrenia
Clark describes schizophrenia and autism as opposites on a spectrum that ranges from high reliance on high-level priors, low attention to sensory data (schizophrenia), to high attention on sensory data and low reliance on high-level priors (autism).
Clark’s description provides a nice simple model which predicts many of the sensory and social peculiarities of autism.
E.g. for conversation in noisy environments, the average person relies a fair amount on high-level guesses about what the speaker might say, whereas an autistic person will focus more heavily on parsing the immediate sounds. PP says the high-level guesses provide important guidance to the lower level processing. Whereas most alternatives to PP assume a one-way flow of information from sensory inputs to high-level modules. The PP explanation follows naturally from Clark’s description of the spectrum, whereas the one-way information flow models seem to need an ad-hoc assumption about something being broken in autistic speech processing.
Small initial differences in processing conversations will build up over time as the average person learns more about modeling minds (by feedback about whether high-level guesses are accurate), while the autistic person will specialize more in distinguishing each individual word from the background noise. That seems almost enough to explain the social differences between autistic people and average people. But I suspect this is just one of many differences (another might be distraction by prediction errors from background noise).
But what do I make of the reports that autistic people are more likely to have alexithymia, are less aware of thirst, and possibly have less introspective awareness in general? While emotions are often generated by high-level thoughts, detecting emotions seems more like sensory perception. And the less awareness of thirst thing seems really hard to reconcile with a model that predicts sense data are more salient.
One possibility is that some parts of the autistic brain do detect emotions, thirst, etc., properly, and the atypical results are due to that information not spreading into the global workspace. But that doesn’t sound like an explanation that Clark would endorse. Attention and the global workspace aren’t quite synonyms for consciousness, but they’re close enough that I have some presumption against believing that we pay attention to something without having it in our global workspace. I’d expect Clark to admit that autistic people aren’t paying much attention to emotions, thirst, etc.
The other possibilities that I see suggest that something is missing from Clark’s “sense data to high-level priors” axis. Maybe it needs to be supplemented with an additional axis?
A better model would imply more attention to sources of evidence that provide frequent surprises.
It seems like any good model of these phenomena should also provide insights about ADHD. Clark doesn’t mention ADHD, but he provided me with the right hints for me to find this paper:
Based on the predictive coding account, top-down expectation abnormalities could be attributed to a disproportionate reliance (precision) allocated to prior beliefs in ASD and to sensory input in ADHD. … Specifically, difficulties in generating predictions would increase reliance on novel sensory evidence. Accordingly, ADHD individuals (and contrary to ASD subjects) exhibit higher or even exaggerated neural responses to novel/unexpected stimuli (Gumenyuk et al. 2005) and lower responses to expected cues (Marzinzik et al. 2012). … Based on predictive coding, our results suggest that ASD individuals could be impaired in their ability to adjust precision if faced with uncertainty due to inflexible expectation (Van de Cruys et al. 2014). In other words, the tendency to inhibit bottom-up influences and the attentional bias toward expected stimuli may trigger difficulties in adjusting precision in changing real-world environments. … Based on the predictive coding framework, our results suggest that difficulties in top-down expectation in children with ADHD are due to high precision ascribed to novel sensory evidence relative to task instructions.
This leads me to suspect that Clark’s axis describes a schizophrenia – ADHD spectrum, and that we need an additional axis to adequately model autism.
I suggest that axis be based on how strongly a person wants to minimize prediction errors, with autistic people having a high desire to minimize error, while schizophrenics and ADHD people having relatively low desires to minimize error.
These two axes imply that for sensory data which produce frequent surprises, such as speech, shiny objects, or scratchy clothing, both ADHD and autistic people pay relatively high attention to the sensory data. Whereas for relatively predictable sensory inputs, such as thirst or happiness, autistic people are more on the schizophrenic end of Clark’s spectrum. That similarity in attentional attitudes toward speech help explain why there’s some overlap between ADHD and autism, even though they are opposites in many ways.
I’m a little unclear here whether I’m using “minimize error” to mean minimize the number of prediction errors, or some estimate of the magnitude of those errors. Most likely I mean the number of errors that exceed whatever threshold is used to decide whether to propagate an error to some higher layer in the hierarchy. [2]
One interesting implication is that autistic people should be relatively comfortable in social contexts that involve singing familiar songs. I’m unclear whether anyone has good evidence about this, and I expect most autistic people to be too focused on the social context to observe any effects on comfort levels.
I’m sure that autism, ADHD, and schizophrenia have a good deal of complexity that aren’t explained by this model, but the broad outlines seem at least moderately close to being right, with a satisfying degree of simplicity.
Miscellaneous points
Clark describes (in Section 4.8) how the mind’s equivalent of a utility function is intertwined with expectations about the world, especially expectations about how our bodies will act, in ways that impair our ability to observe a clear-cut utility function. He strongly hints that any alternative would require more computational resources and/or react more slowly. That’s bad news for hopes that we’ll have a simple way to understand what our robot overlords want.
The PP model helps reduce some of the confusion in the nature-nurture debate by illustrating how highly abstract “hyperpriors” might be mostly learned (before any concrete instances are learned), yet look like innate knowledge to anyone who doesn’t understand the PP model. E.g. we might learn that objects tend to be “cohesive, bounded, and rigid” before learning concepts such as “balls, discs, stuffed toys”. We really need someone to simplify this idea enough so that the average person who engages in nature-nurture debates can understand it. But that seems beyond Clark’s skill or mine.
The PP model helps clarify why we mostly perceive the world as unitary, rather than perceiving probability density functions. Seeing probability densities would be more helpful if perception were implemented separately from action. But we wouldn’t function well if our commands to our muscles reflected a probability density over what word to say next. The intertwining of our perception with those commands pushes us toward selecting a single best interpretation of our sensory input, even when a probability distribution would more accurately reflect what we know.
PP helps explain the concept of choking: our muscle movements are generated to match our predictions about how our body will move. If we devote attention to the discrepancy between that predicted motion and our current body position, that will draw attention away from the lower-level discrepancies that are used to guide our muscles to adjust to match their predicted state. Note that this implies some difference between choking at sports versus choking on an SAT test. PP can probably explain choking on the SAT as a weaker version of choking on fluent movements – the part about high-level error signals interfering with more valuable lower-level error signals still seems to apply.
The book is mildly helpful at explaining some cognitive “illusions”. E.g. the size-weight illusion results from people using a concept that’s better labeled “throwability” when asked about weight.
Conclusion
The book devotes too much attention to the basic question of whether PP describes the basics of human minds, and not quite enough exploration into how to apply PP to a wide variety of contexts.
I didn’t need conclusive evidence that the basic PP model was realistic, because once I understood it, it seemed obvious that it enables faster reaction times than alternative models, and there are strong evolutionary pressures for fast reaction times.
I would rather have read a book that focused more on understanding the implications of the PP model.
Footnotes
[1] – I’m a bit fuzzy on what Clark means by gain here – information gain seems most plausible, but the electronics meaning also seems relevant. This is an area where Clark is paraphrasing notoriously hard-to-understand ideas from Friston without much improving their clarity.
[2] – One part of the book describes a potential criticism of PP in which PP implies we’ll want to minimize prediction errors by hiding in a dark room (which temporarily minimizes prediction errors, at the cost of increasing prediction errors when hunger/thirst become important). Autism seems to involve an above-average preference for this dark room strategy. (See also Toward a Predictive Theory of Depression).
P.S. While researching this, I stumbled on a story titled Vaccinations Made My Cat Autistic, which I would rephrase as “Admiral Ticklebelly holds a grudge against a human over being vaccinated”.
Pingback: Rational Feed – deluks917
Pingback: this model explains – scryptomnema