OpenAI has told us in some detail what they’ve done to make GPT-4 safe.
This post will complain about some misguided aspects of OpenAI’s goals.Continue Reading
OpenAI has told us in some detail what they’ve done to make GPT-4 safe.
This post will complain about some misguided aspects of OpenAI’s goals.Continue Reading
Book review: How Social Science Got Better: Overcoming Bias with More Evidence, Diversity, and Self-Reflection, by Matt Grossmann.
It’s easy for me to become disenchanted with social science when so much of what I read about it is selected from the most pessimistic and controversial reports.
With this book, Grossmann helped me to correct my biased view of the field. While plenty of valid criticisms have been made about social science, many of the complaints lobbed against it are little more than straw men.
Grossmann offers a sweeping overview of the progress that the field has made over the past few decades. His tone is optimistic and hearkens back to Steven Pinker’s Better Angels of our Nature, while maintaining a rigorous (but dry) style akin to the less controversial sections of Robin Hanson’s Age of Em. Throughout the book, Grossmann aims to outdo even Wikipedia in his use of a neutral point of view.Continue Reading
Book review: Noise: A Flaw in Human Judgment, by Daniel Kahneman, Olivier Sibony, and Cass R. Sunstein.
Doctors are more willing to order a test for patients they see in the morning than for those they see late in the day.
Asylum applicants chances of prevailing may be as low as 5% or as high as 88% purely due to which judge hears their case.
Clouds Make Nerds Look Good, in the sense that university admissions officers give higher weight to academic attributes on cloudy days.
These are examples of what the authors describe as an important and neglected problem.
A more precise description of the book’s topic is variations in judgment, with judgment defined as “measurement in which the instrument is a human mind”.Continue Reading
It’s been a decade since I blogged about the benefits of avoiding news.
In that time I mostly followed the advice I gave. I kicked my addiction to The Daily Show in late 2016 after it switched from ridiculing Trump to portraying him as scary (probably part of a general trend for the show to be less funny). I got more free time, and only missed the news a little bit.
Then the pandemic hit.
I suddenly needed lots of new information. Corporate earnings releases were too slow.
Wikipedia, Our World in Data, Metaculus, and some newly created COVID-specific web sites partly filled that gap. But I still needed more, and I mostly didn’t manage to find anything that was faster or more informative than the
news media storyteller industry.
That at least correlated with higher than normal stress. I suspect that paying attention to the storytellers partly caused the stress.Continue Reading
Here are a bunch of loosely organized ideas that didn’t have a natural place in my review of WEIRDest People.
WEIRDest People is almost as important for understanding human biases as is Thinking Fast and Slow.Continue Reading
I said in my review of WEIRDest People that the Flynn effect seems like a natural consequence of thinking styles that became more analytical, abstract, reductionist, and numerical.
I’ll expand here on some questions which I swept under the rug, so that I could keep that review focused on the book’s most important aspects.
After reading WEIRDest People, I find that the goal of a culture-neutral IQ test looks strange (and, of course, WEIRD). At least as strange as trying to fix basketball to stop favoring tall people.Continue Reading
Book review: The AI Does Not Hate You: Superintelligence, Rationality and the Race to Save the World, by Tom Chivers.
This book is a sympathetic portrayal of the rationalist movement by a quasi-outsider. It includes a well-organized explanation of why some people expect tha AI will create large risks sometime this century, written in simple language that is suitable for a broad audience.
Caveat: I know many of the people who are described in the book. I’ve had some sort of connection with the rationalist movement since before it became distinct from transhumanism, and I’ve been mostly an insider since 2012. I read this book mainly because I was interested in how the rationalist movement looks to outsiders.
Chivers is a science writer. I normally avoid books by science writers, due to an impression that they mostly focus on telling interesting stories, without developing a deep understanding of the topics they write about.
Chivers’ understanding of the rationalist movement doesn’t quite qualify as deep, but he was surprisingly careful to read a lot about the subject, and to write only things he did understand.
Many times I reacted to something he wrote with “that’s close, but not quite right”. Usually when I reacted that way, Chivers did a good job of describing the the rationalist message in question, and the main problem was either that rationalists haven’t figured out how to explain their ideas in a way that a board audience can understand, or that rationalists are confused. So the complaints I make in the rest of this review are at most weakly directed in Chivers direction.
I saw two areas where Chivers overlooked something important.
One involves CFAR.
Chivers wrote seven chapters on biases, and how rationalists view them, ending with “the most important bias”: knowing about biases can make you more biased. (italics his).
I get the impression that Chivers is sweeping this problem under the rug (Do we fight that bias by being aware of it? Didn’t we just read that that doesn’t work?). That is roughly what happened with many people who learned rationalism solely via written descriptions.
Then much later, when describing how he handled his conflicting attitudes toward the risks from AI, he gives a really great description of maybe 3% of what CFAR teaches (internal double crux), much like a blind man giving a really clear description of the upper half of an elephant’s trunk. He prefaces this narrative with the apt warning: “I am aware that this all sounds a bit mystical and self-helpy. It’s not.”
Chivers doesn’t seem to connect this exercise with the goal of overcoming biases. Maybe he was too busy applying the technique on an important problem to notice the connection with his prior discussions of Bayes, biases, and sanity. It would be reasonable for him to argue that CFAR’s ideas have diverged enough to belong in a separate category, but he seems to put them in a different category by accident, without realizing that many of us consider CFAR to be an important continuation of rationalists’ interest in biases.
Chivers comes very close to covering all of the layman-accessible claims that Yudkowsky and Bostrom make. My one complaint here is that he only give vague hints about why one bad AI can’t be stopped by other AI’s.
A key claim of many leading rationalists is that AI will have some winner take all dynamics that will lead to one AI having a decisive strategic advantage after it crosses some key threshold, such as human-level intelligence.
This is a controversial position that is somewhat connected to foom (fast takeoff), but which might be correct even without foom.
“If I stop caring about chess, that won’t help me win any chess games, now will it?” – That chapter title provides a good explanation of why a simple AI would continue caring about its most fundamental goals.
Is that also true of an AI with more complex, human-like goals? Chivers is partly successful at explaining how to apply the concept of a utility function to a human-like intelligence. Rationalists (or at least those who actively research AI safety) have a clear meaning here, at least as applied to agents that can be modeled mathematically. But when laymen try to apply that to humans, confusion abounds, due to the ease of conflating subgoals with ultimate goals.
Chivers tries to clarify, using the story of Odysseus and the Sirens, and claims that the Sirens would rewrite Odysseus’ utility function. I’m not sure how we can verify that the Sirens work that way, or whether they would merely persuade Odysseus to make false predictions about his expected utility. Chivers at least states clearly that the Sirens try to prevent Odysseus (by making him run aground) from doing what his pre-Siren utility function advises. Chivers’ point could be a bit clearer if he specified that in his (nonstandard?) version of the story, the Sirens make Odysseus want to run aground.
“Essentially, he [Yudkowsky] (and the Rationalists) are thoroughgoing utilitarians.” – That’s a bit misleading. Leading rationalists are predominantly consequentialists, but mostly avoid committing to a moral system as specific as utilitarianism. Leading rationalists also mostly endorse moral uncertainty. Rationalists mostly endorse utilitarian-style calculation (which entails some of the controversial features of utilitarianism), but are careful to combine that with worry about whether we’re optimizing the quantity that we want to optimize.
I also recommend Utilitarianism and its discontents as an example of one rationalist’s nuanced partial endorsement of utilitarianism.
Chivers describes Holden Karnofsky as wanting “to get governments and tech companies to sign treaties saying they’ll submit any AGI designs to outside scrutiny before switching them on. It wouldn’t be iron-clad, because firms might simply lie”.
Most rationalists seem pessimistic about treaties such as this.
Lying is hardly the only problem. This idea assumes that there will be a tiny number of attempts, each with a very small number of launches that look like the real thing, as happened with the first moon landing and the first atomic bomb. Yet the history of software development suggests it will be something more like hundreds of attempts that look like they might succeed. I wouldn’t be surprised if there are millions of times when an AI is turned on, and the developer has some hope that this time it will grow into a human-level AGI. There’s no way that a large number of designs will get sufficient outside scrutiny to be of much use.
And if a developer is trying new versions of their system once a day (e.g. making small changes to a number that controls, say, openness to new experience), any requirement to submit all new versions for outside scrutiny would cause large delays, creating large incentives to subvert the requirement.
So any realistic treaty would need provisions that identify a relatively small set of design choices that need to be scrutinized.
I see few signs that any experts are close to developing a consensus about what criteria would be appropriate here, and I expect that doing so would require a significant fraction of the total wisdom needed for AI safety. I discussed my hope for one such criterion in my review of Drexler’s Reframing Superintelligence paper.
Chivers mentions several plausible explanations for what he labels the “semi-death of LessWrong”, the most obvious being that Eliezer Yudkowsky finished most of the blogging that he had wanted to do there. But I’m puzzled by one explanation that Chivers reports: “the attitude … of thinking they can rebuild everything”. Quoting Robin Hanson:
At Xanadu they had to do everything different: they had to organize their meetings differently and orient their screens differently and hire a different kind of manager, everything had to be different because they were creative types and full of themselves. And that’s the kind of people who started the Rationalists.
That seems like a partly apt explanation for the demise of the rationalist startups MetaMed and Arbital. But LessWrong mostly copied existing sites, such as Reddit, and was only ambitious in the sense that Eliezer was ambitious about what ideas to communicate.
I guess a book about rationalists can’t resist mentioning polyamory. “For instance, for a lot of people it would be difficult not to be jealous.” Yes, when I lived in a mostly monogamous culture, jealousy seemed pretty standard. That attititude melted away when the bay area cultures that I associated with started adopting polyamory or something similar (shortly before the rationalists became a culture). Jealousy has much more purpose if my partner is flirting with monogamous people than if he’s flirting with polyamorists.
Less dramatically, We all know people who are afraid of visiting their city centres because of terrorist attacks, but don’t think twice about driving to work.
This suggests some weird filter bubbles somewhere. I thought that fear of cities got forgotten within a month or so after 9/11. Is this a difference between London and the US? Am I out of touch with popular concerns? Does Chivers associate more with paranoid people than I do? I don’t see any obvious answer.
It would be really nice if Chivers and Yudkowsky could team up to write a book, but this book is a close substitute for such a collaboration.
See also Scott Aaronson’s review.
The point of this blog post feels almost too obvious to be worth saying, yet I doubt that it’s widely followed.
People often avoid doing projects that have a low probability of success, even when the expected value is high. To counter this bias, I recommend that you mentally combine many such projects into a strategy of trying new things, and evaluate the strategy’s probability of success.
Eliezer says in On Doing the Improbable:
I’ve noticed that, by my standards and on an Eliezeromorphic metric, most people seem to require catastrophically high levels of faith in what they’re doing in order to stick to it. By this I mean that they would not have stuck to writing the Sequences or HPMOR or working on AGI alignment past the first few months of real difficulty, without assigning odds in the vicinity of 10x what I started out assigning that the project would work. … But you can’t get numbers in the range of what I estimate to be something like 70% as the required threshold before people will carry on through bad times. “It might not work” is enough to force them to make a great effort to continue past that 30% failure probability. It’s not good decision theory but it seems to be how people actually work on group projects where they are not personally madly driven to accomplish the thing.
I expect this reluctance to work on projects with a large chance of failure is a widespread problem for individual self-improvement experiments.
One piece of advice I got from my CFAR workshop was to try lots of things. Their reasoning involved the expectation that we’d repeat the things that worked, and forget the things that didn’t work.
I’ve been hesitant to apply this advice to things that feel unlikely to work, and I expect other people have similar reluctance.
The relevant kind of “things” are experiments that cost maybe 10 to 100 hours to try, which don’t risk much other than wasting time, and for which I should expect on the order of a 10% chance of noticeable long-term benefits.
Here are some examples of the kind of experiments I have in mind:
I’ve cheated slightly, by being more likely to add something to this list if it worked for me than if it was a failure that I’d rather forget. So my success rate with these was around 50%.
The simple practice of forgetting about the failures and mostly repeating the successes is almost enough to cause the net value of these experiments to be positive. More importantly, I kept the costs of these experiments low, so the benefits of the top few outweighed the costs of the failures by a large factor.
I face a similar situation when I’m investing.
The probability that I’ll make any profit on a given investment is close to 50%, and the probability of beating the market on a given investment is lower. I don’t calculate actual numbers for that, because doing so would be more likely to bias me than to help me.
I would find it rather discouraging to evaluate each investment separately. Doing so would focus my attention on the fact that any individual result is indistinguishable from luck.
Instead, I focus my evaluations much more on bundles of hundreds of trades, often associated with a particular strategy. Aggregating evidence in that manner smooths out the good and bad luck to make my skill (or lack thereof) more conspicuous. I’m focusing in this post not on the logical interpretation of evidence, but on how the subconscious parts of my mind react. This mental bundling of tasks is particularly important for my subconscious impressions of whether I’m being productive.
I believe this is a well-known insight (possibly from poker?), but I can’t figure out where I’ve seen it described.
I’ve partly applied this approach to self-improvement tasks (not quite as explicitly as I ought to), and it has probably helped.
Book review: Time Biases: A Theory of Rational Planning and Personal Persistence, by Meghan Sullivan.
I was very unsure about whether this book would be worth reading, as it could easily have been focused on complaints about behavior that experts have long known are mistaken.
I was pleasantly surprised when it quickly got to some of the really hard questions, and was thoughtful about what questions deserved attention. I disagree with enough of Sullivan’s premises that I have significant disagreements with her conclusions. Yet her reasoning is usually good enough that I’m unsure what to make of our disagreements – they’re typically due to differences of intuition that she admits are controversial.
I had hoped for some discussion of ethics (e.g. what discount rate to use in evaluating climate change), whereas the book focuses purely on prudential rationality (i.e. what’s rational for a self-interested person). Still, the discussion of prudential rationality covers most of the issues that make the ethical choices hard.
A key issue is the nature of personal identity – does one’s identity change over time?
No, this isn’t about cutlery.
I’m proposing to fork science in the sense that Bitcoin was forked, into an adversarial science and a crowdsourced science.
As with Bitcoin, I have no expectation that the two branches will be equal.
These ideas could apply to most fields of science, but some fields need change more than others. P-values and p-hacking controversy are signs that a field needs change. Fields that don’t care much about p-values don’t need as much change, e.g. physics and computer science. I’ll focus mainly on medicine and psychology, and leave aside the harder-to-improve social sciences.
The term “science” has a range of meanings.
One extreme focuses on “perform experiments in order to test hypotheses”, as in The Scientist In The Crib. I’ll call this the personal knowledge version of science.
A different extreme includes formal institutions such as peer review, RCTs, etc. I’ll call this the authoritative knowledge version of science.
Both of these meanings of the word science are floating around, with little effort to distinguish them . I suspect that promotes confusion about what standards to apply to scientific claims. And I’m concerned that people will use the high status of authoritative science to encourage us to ignore knowledge that doesn’t fit within its paradigm.