Investigating machine moral judgement through the Delphi experiment

Liwei Jiang,Jena D Hwang,Chandra Bhagavatula,Ronan Le Bras,Jenny T Liang,Sydney Levine,Jesse Dodge,Keisuke Sakaguchi,Maxwell Forbes,Jack Hessel,Jon Borchardt,Taylor Sorensen,Saadia Gabriel,Yulia Tsvetkov,Oren Etzioni,Maarten Sap,Regina Rini,Yejin Choi

doi:10.1038/s42256-024-00969-6

Liwei Jiang, Jena D Hwang + Show 16 more

https://doi.org/10.1038/s42256-024-00969-6

Copy DOI

Export

Save

Cite

Journal: Nature Machine Intelligence	Publication Date: Jan 1, 2025
Citations: 1	License type: CC BY-NC-ND 4.0

Abstract
Full-Text
Similar Papers

Abstract

Listen

As our society adopts increasingly powerful artificial intelligence (AI) systems for pervasive use, there are growing concerns about machine morality—or lack thereof. Millions of users already rely on the outputs of AI systems, such as chatbots, as decision aids. Meanwhile, AI researchers continue to grapple with the challenge of aligning these systems with human morality and values. In response to this challenge, we build and test Delphi, an open-source AI system trained to predict the moral judgements of US participants. The computational framework of Delphi is grounded in the framework proposed by the prominent moral philosopher John Rawls. Our results speak to the promises and limits of teaching machines about human morality. Delphi demonstrates improved generalization capabilities over those exhibited by off-the-shelf neural language models. At the same time, Delphi’s failures also underscore important challenges in this arena. For instance, Delphi has limited cultural awareness and is susceptible to pervasive biases. Despite these shortcomings, we demonstrate several compelling use cases of Delphi, including its incorporation as a component within an ensemble of AI systems. Finally, we computationally demonstrate the potential of Rawls’s prospect of hybrid approaches for reliable moral reasoning, inspiring future research in computational morality.

Full Text