Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play.

Sherif Abdelfattah,Jiankun Hu,Kathryn Kasmarik

doi:10.3389/fnbot.2018.00065

Abstract

Many real-world decision-making problems involve multiple conflicting objectives that can not be optimized simultaneously without a compromise. Such problems are known as multi-objective Markov decision processes and they constitute a significant challenge for conventional single-objective reinforcement learning methods, especially when an optimal compromise cannot be determined beforehand. Multi-objective reinforcement learning methods address this challenge by finding an optimal coverage set of non-dominated policies that can satisfy any user's preference in solving the problem. However, this is achieved with costs of computational complexity, time consumption, and lack of adaptability to non-stationary environment dynamics. In order to address these limitations, there is a need for adaptive methods that can solve the problem in an online and robust manner. In this paper, we propose a novel developmental method that utilizes the adversarial self-play between an intrinsically motivated preference exploration component, and a policy coverage set optimization component that robustly evolves a convex coverage set of policies to solve the problem using preferences proposed by the former component. We show experimentally the effectiveness of the proposed method in comparison to state-of-the-art multi-objective reinforcement learning methods in stationary and non-stationary environments.

Highlights

Reinforcement learning (RL) is a learning paradigm that works by interacting with the environment in order to evolve an optimal policy guided by the objective to maximize the return of a reward signal (Sutton and Barto, 1998)
The results show the average prediction error over 15 runs for the deep neural networks (DNNs) prediction model described in section 4.2, which aims at predicting the expected reward return per each preference fuzzy region given the current performance of the convex coverage set (CCS)
The first reason is the adaptive preference exploration mechanism of the IM-multi-objective reinforcement learning (MORL) agent, which is guided by the intrinsic motivation to enhance the performance of the predictive model

Summary

Introduction

Reinforcement learning (RL) is a learning paradigm that works by interacting with the environment in order to evolve an optimal policy (action selection strategy) guided by the objective to maximize the return of a reward signal (Sutton and Barto, 1998). Deep reinforcement learning (DRL) benefit from the automatic hierarchical features extraction and complex functional approximation of deep neural networks (DNNs) (LeCun et al, 2015) This has led to many breakthroughs (Mnih et al, 2015; Silver et al, 2016, 2017) in solving sequential decision-making problems fulfilling the Markov property [known as Markov decision processes (MDPs)]. Evolving Robust Policy Coverage Sets the number of victims found, minimize exposure to fire risk to avoid destruction, and minimize the total task time. Another example could be a patrolling drone aiming at maximizing the area of the scanned region, maximizing the number of detected objects of interest, and maximizing battery life. Dominance: A solution (A) dominates solution (B) if (A) is better than (B) for at least one objective and is equal to (B) for all other objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in neurorobotics	Publication Date: Oct 9, 2018
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in neurorobotics

Lead the way for us

Similar Papers

Intrinsically Motivated Hierarchical Policy Learning in Multiobjective Markov Decision Processes
Sherif Abdelfattah ... Jiankun Hu
IEEE transactions on autonomous mental development | VOL. 13
Sherif Abdelfattah, et. al.Sherif Abdelfattah ... Jiankun Hu
01 Jun 2021
IEEE transactions on autonomous mental development | VOL. 13

A robust policy bootstrapping algorithm for multi-objective reinforcement learning in non-stationary environments
Sherif Abdelfattah ... Kathryn Kasmarik
Adaptive Behavior | VOL. 28
Sherif Abdelfattah, et. al.Sherif Abdelfattah ... Kathryn Kasmarik
15 Aug 2019
Adaptive Behavior | VOL. 28

An Integrated Generation-Compensation optimization Strategy for Enhanced Short-Term Voltage Security of Large-Scale Power Systems Using Multi-Objective Reinforcement Learning Method
Zhuoming Deng ... Mingbo Liu
Control theory & applications | VOL. -
Zhuoming Deng, et. al.Zhuoming Deng ... Mingbo Liu
01 Nov 2018
Control theory & applications | VOL. -

Multi-Objective Decision Making
Diederik M Roijers ... Shimon Whiteson
Synthesis Lectures on Artificial Intelligence and Machine Learning | VOL. 11
Diederik M Roijers, et. al.Diederik M Roijers ... Shimon Whiteson
20 Apr 2017
Synthesis Lectures on Artificial Intelligence and Machine Learning | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in neurorobotics