The Alignment Problem: Machine Learning and Human Values

Brian Christian

doi:10.56315/pscf12-21christian

Abstract

THE ALIGNMENT PROBLEM: Machine Learning and Human Values by Brian Christian. New York: W. W. Norton, 2020. 344 pages. Hardcover; $28.95. ISBN: 9780393635829. *The global conversation about artificial intelligence (AI) is increasingly polemic--"AI will change the world!" "AI will ruin the world!" Amidst the strife, Brian Christian's work stands out. It is thoughtful, nuanced, and, at times, even poetic. Coming on the heels of his two other bestsellers, The Most Human Human and Algorithms to Live By, this meticulously researched recounting of the last decade of research into AI safety provides a broad perspective of the field and its future. *The "alignment problem" in the title refers to the disconnect between what AI does and what we want it to do. In Christian's words, it is the disconnect between "machine learning and human values." This disconnect has been the subject of intense research in recent years, as both companies and academics continually discover that AIs inherit the mistakes and biases of their creators. *For example, we train AIs that predict recidivism rates of convicted criminals in hopes of crafting more accurate sentences. However, the AIs produce racially biased outcomes. Or, we train AIs which map words into mathematical spaces. These AIs can perform mathematical "computations" on words, such as "king - man + woman = queen" and "Paris - France + Italy = Rome." But they also say that "doctor - man + woman = nurse" and "computer programmer - man + woman = homemaker." These examples of racial and gender bias are some of the numerous ways that human bias appears inside the supposedly impartial tools we have created. *As Norbert Wiener, a famous mathematician in the mid-twentieth century, put it, "We had better be sure the purpose put into the machine is the purpose which we really desire" (p. 312). The discoveries of the last ten years have shocked researchers into realizing that our machines have purposes we never intended. Christian's message is clear: these mistakes must be fixed before those machines become a fixed part of our everyday lives. *The book is divided into three main sections. The first, Prophecy, provides a historical overview of how researchers uncovered the AI biases that are now well known. It traces the origins of how AI models ended up in the public sphere and the history of how people have tried to solve the problems AI creates. Perhaps one of the most interesting anecdotes in this section is about how researchers try to create explainable models to comply with GDPR requirements. *The second section, Agency, explores the alignment problem in the context of reinforcement learning. Reinforcement learning involves teaching computer "agents" (aka AIs) to perform certain tasks using complex reward systems. Time and time again, the reward systems that researchers create have unintended side effects, and Christian recounts numerous humorous examples of this. He explains in simple terms why it is so difficult to correctly motivate the behaviors we wish to see in others (both humans and machines), and what it might take to create machines which are truly curious. This section feels a bit long. Christian dives deeply into the research of a few specific labs and appears to lose his logical thread in the weeds of research. Eventually, he emerges. *The final section, Normativity, provides perspective on current efforts to understand and fix the alignment problem. Its subchapters, "Imitation," "Inference," and "Uncertainty," reference different qualities that human researchers struggle to instill in machines. Imitating correct behaviors while ignoring bad ones is hard, as is getting a machine to perform correctly on data it hasn't seen before. Finally, teaching a model (and humans reading its results) to correctly interpret uncertainty is an active area of research with no concrete solutions. *After spending over three hundred pages recounting the pitfalls of AI and the difficulties of realigning models with human values, Christian ends on a hopeful note. He postulates that the issues discovered in machine-learning models illuminate societal issues that might otherwise be ignored. "Unfair pretrial detection models, for one thing, shine a spotlight on upstream inequities. Biased language models give us, among other things, a way to measure the state of our discourse and offer us a benchmark against which to try to improve and better ourselves ... In seeing a kind of mind at work as it digests and reacts to the world, we will learn something both about the world and also, perhaps, about minds" (p. 328). *As a Christ-follower, I believe the biases found in AI are both terrible and unsurprising. Humans are imperfect creators. While researchers' efforts to fix biases and shortcomings in AI systems are important and worthwhile, they can never exorcise fallen human nature from AI. Christian's conclusions about AI pointing to biases in humans comes close to this idea but avoids taking an overtly theological stance. *This book is well worth reading for those who wish to better understand the limitations of AI and current efforts to fix them. It weaves together history, mathematics, ethics, and philosophy, while remaining accessible to a broad audience through smooth explanations of detailed concepts. You don't need to be an AI expert (or even familiar with AI at all) to appreciate this book's insights. *After you're done reading it, recommend this book to the next person who tells you, with absolute certainty, that AI will either save or ruin the world. Christian's book provides a much-needed dose of sanity and perspective amidst the hype. *Reviewed by Emily Wenger, graduate student in the Department of Computer Science, University of Chicago, Chicago, IL 60637.

Full Text