Reward tampering and evolutionary computation: a study of concrete AI-safety problems using evolutionary algorithms

Mathias K Nilsen,Tønnes F Nygaard,Kai Olav Ellefsen

doi:10.1007/s10710-023-09456-0

Mathias K Nilsen, Tønnes F Nygaard + Show 1 more

Open Access

https://doi.org/10.1007/s10710-023-09456-0

Copy DOI

Abstract

Reward tampering is a problem that will impact the trustworthiness of the powerful AI systems of the future. Reward Tampering describes the problem where AI agents bypass their intended objective, enabling unintended and potentially harmful behaviours. This paper investigates whether the creative potential of evolutionary algorithms could help ensure trustworthy solutions when facing this problem. The reason why evolutionary algorithms may help combat reward tampering is that they are able to find a diverse collection of different solutions to a problem within a single run, aiding the search for desirable solutions. Four different evolutionary algorithms were deployed in tasks illustrating the problem of reward tampering. The algorithms were designed with varying degrees of human expertise, measuring how human guidance influences the ability to discover trustworthy solutions. The results indicate that the algorithms’ ability to find and preserve trustworthy solutions is very dependent on preserving diversity during the search. Algorithms searching for behavioural diversity showed to be the most effective against reward tampering. Human expertise also showed to improve the certainty and quality of safe solutions, but even with only a minimal degree of human expertise, domain-independent diversity management was found to discover safe solutions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Reward tampering and evolutionary computation: a study of concrete AI-safety problems using evolutionary algorithms

Abstract

Talk to us

Similar Papers

More From: Genetic Programming and Evolvable Machines

Lead the way for us

Journal: Genetic Programming and Evolvable Machines	Publication Date: Sep 19, 2023
License type: CC BY 4.0

Similar Papers

Domain Decomposition Evolutionary Algorithm for Multi-Modal Function Optimization
Guangming Lin ... Yuping Che
-
Guangming Lin, et. al.Guangming Lin ... Yuping Che
01 Nov 2008
01 Nov 2008

Evolutionary Computation. A Unified Approach. Kenneth A. De Jong. (2006, MIT Press.) £32.95, $50.00, 256 pages
Efrén Mezura-Montes
Artificial Life | VOL. 13
Efrén Mezura-MontesEfrén Mezura-Montes
01 Oct 2007
Evolutionary Computation. A Unified Approach. Kenneth A. De Jong. (2006, MIT Press.) £32.95, $50.00, 256 pages
Efrén Mezura-Montes

Remembering Alex Fraser and explorations in learning without human expertise
D.B Fogel
-
D.B FogelD.B Fogel
18 Nov 2002
18 Nov 2002

Back to the Roots: Multi-X Evolutionary Computation
Abhishek Gupta ... Yew-Soon Ong
Cognitive Computation | VOL. 11
Abhishek Gupta, et. al.Abhishek Gupta ... Yew-Soon Ong
03 Jan 2019
Cognitive Computation | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reward tampering and evolutionary computation: a study of concrete AI-safety problems using evolutionary algorithms

Abstract

Talk to us

Similar Papers

More From: Genetic Programming and Evolvable Machines