Abstract

AI research is being challenged with ensuring that autonomous agents learn to behave ethically, namely in alignment with moral values. Here, we propose a novel way of tackling the value alignment problem as a two-step process. The first step consists on formalising moral values and value aligned behaviour based on philosophical foundations. Our formalisation is compatible with the framework of (Multi-Objective) Reinforcement Learning, to ease the handling of an agent’s individual and ethical objectives. The second step consists in designing an environment wherein an agent learns to behave ethically while pursuing its individual objective. We leverage on our theoretical results to introduce an algorithm that automates our two-step approach. In the cases where value-aligned behaviour is possible, our algorithm produces a learning environment for the agent wherein it will learn a value-aligned behaviour.

Highlights

  • As artificial agents become more intelligent and pervade our societies, it is key to guarantee that situated agents act valuealigned, that is, in alignment with human values (Russell et al, 2015; Soares & Fallenstein, 2014)

  • We provide philosophical foundations that serve as a basis for formalising the notion of moral value and subsequently the notion of ethical behaviour, which together allow us to characterise the concept of ethical objective of Fig. 1

  • We model the environment as a Multi-Objective Markov Decision Process (MOMDP) (Roijers & Whiteson, 2017)

Read more

Summary

Introduction

As artificial agents become more intelligent and pervade our societies, it is key to guarantee that situated agents act valuealigned, that is, in alignment with human values (Russell et al, 2015; Soares & Fallenstein, 2014). There has been a growing interest in the Machine Ethics (Rossi & Mattei, 2019; Yu et al, 2018) and AI Safety (Amodei et al, 2016; Leike et al, 2017) communities in the Among these two communities, it is common to find proposals to tackle the value alignment problem by designing an environment that incentivises ethical behaviours (i.e., behaviours aligned with a given moral value) by means of some exogenous reward function (e.g., Abel et al, 2016; Balakrishnan et al, 2019; Noothigattu et al, 2019; Riedl & Harrison, 2016; Rodriguez-Soto et al, 2020; Wu & Lin, 2017). These approaches suffer from well-known shortcomings, as discussed in Arnold et al (2017), Tolmeijer et al (2021), Gabriel (2020): (1)

Page 2 of 17
Page 4 of 17
Page 6 of 17
Page 8 of 17
Page 10 of 17
Page 12 of 17
Related work
Conclusions and future work
Page 16 of 17
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call