Instilling moral value alignment by means of multi-objective reinforcement learning

Manel Rodriguez-Soto,Maite Lopez-Sanchez,Juan Antonio Rodriguez-Aguilar,Marc Serramia

doi:10.1007/s10676-022-09635-0

Manel Rodriguez-Soto, Maite Lopez-Sanchez + Show 2 more

Open Access

https://doi.org/10.1007/s10676-022-09635-0

Copy DOI

Abstract

AI research is being challenged with ensuring that autonomous agents learn to behave ethically, namely in alignment with moral values. Here, we propose a novel way of tackling the value alignment problem as a two-step process. The first step consists on formalising moral values and value aligned behaviour based on philosophical foundations. Our formalisation is compatible with the framework of (Multi-Objective) Reinforcement Learning, to ease the handling of an agent’s individual and ethical objectives. The second step consists in designing an environment wherein an agent learns to behave ethically while pursuing its individual objective. We leverage on our theoretical results to introduce an algorithm that automates our two-step approach. In the cases where value-aligned behaviour is possible, our algorithm produces a learning environment for the agent wherein it will learn a value-aligned behaviour.

Highlights

As artificial agents become more intelligent and pervade our societies, it is key to guarantee that situated agents act valuealigned, that is, in alignment with human values (Russell et al, 2015; Soares & Fallenstein, 2014)
We provide philosophical foundations that serve as a basis for formalising the notion of moral value and subsequently the notion of ethical behaviour, which together allow us to characterise the concept of ethical objective of Fig. 1
We model the environment as a Multi-Objective Markov Decision Process (MOMDP) (Roijers & Whiteson, 2017)

Summary

Introduction

As artificial agents become more intelligent and pervade our societies, it is key to guarantee that situated agents act valuealigned, that is, in alignment with human values (Russell et al, 2015; Soares & Fallenstein, 2014). There has been a growing interest in the Machine Ethics (Rossi & Mattei, 2019; Yu et al, 2018) and AI Safety (Amodei et al, 2016; Leike et al, 2017) communities in the Among these two communities, it is common to find proposals to tackle the value alignment problem by designing an environment that incentivises ethical behaviours (i.e., behaviours aligned with a given moral value) by means of some exogenous reward function (e.g., Abel et al, 2016; Balakrishnan et al, 2019; Noothigattu et al, 2019; Riedl & Harrison, 2016; Rodriguez-Soto et al, 2020; Wu & Lin, 2017). These approaches suffer from well-known shortcomings, as discussed in Arnold et al (2017), Tolmeijer et al (2021), Gabriel (2020): (1)

Page 2 of 17

Page 4 of 17

Page 6 of 17

Page 8 of 17

Page 10 of 17

Page 12 of 17

Related work

Conclusions and future work

Page 16 of 17

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Ethics and Information Technology	Publication Date: Jan 24, 2022
Citations: 10	License type: open-access

R Discovery Prime

R Discovery Prime

Instilling moral value alignment by means of multi-objective reinforcement learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Ethics and Information Technology

Lead the way for us

Similar Papers

Multi-Objective Reinforcement Learning for Designing Ethical Environments
Manel Rodriguez-Soto ... Juan A. Rodriguez Aguilar
-
Manel Rodriguez-Soto, et. al.Manel Rodriguez-Soto ... Juan A. Rodriguez Aguilar
01 Aug 2021
01 Aug 2021

Multi-objective safe reinforcement learning
Naoto Horie ... Atsuko Mutoh
Artificial Life and Robotics | VOL. -
Naoto Horie, et. al.Naoto Horie ... Atsuko Mutoh
18 Jan 2019
Artificial Life and Robotics | VOL. -

Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning
Man-Je Kim ... Hyunsoo Park
Electronics | VOL. 11
Man-Je Kim, et. al.Man-Je Kim ... Hyunsoo Park
28 Mar 2022
Electronics | VOL. 11

Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning
Naoto Horie ... Nobuhiro Inuzuka
Artificial Life and Robotics | VOL. 24
Naoto Horie, et. al.Naoto Horie ... Nobuhiro Inuzuka
08 Feb 2019
Artificial Life and Robotics | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Instilling moral value alignment by means of multi-objective reinforcement learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Ethics and Information Technology