Steve Omohundro has recently written a paper and given a talk (a video should become available soon) on AI ethics with arguments whose most important concerns resemble Eliezer Yudkowsky’s. I find Steve’s style more organized and more likely to convince mainstream researchers than Eliezer’s best attempt so far.
Steve avoids Eliezer’s suspicious claims about how fast AI will take off, and phrases his arguments in ways that are largely independent of the takeoff speed. But a sentence or two in the conclusion of his paper suggests that he is leaning toward solutions which assume multiple AIs will be able to safeguard against a single AI imposing its goals on the world. He doesn’t appear to have a good reason to consider this assumption reliable, but at least he doesn’t show the kind of disturbing certainty that Eliezer has about the first self-improving AI becoming powerful enough to take over the world.
Possibly the most important news in Steve’s talk was his statement that he had largely stopped working to create intelligent software due to his concerns about safely specifying goals for an AI. He indicated that one important insight that contributed to this change of mind came when Carl Shulman pointed out a flaw in Steve’s proposal for a utility function which included a goal of the AI shutting itself off after a specified time (the flaw involves a small chance of physics being different from apparent physics and how the AI will evaluate expected utilities resulting from that improbable physics).