A robust policy bootstrapping algorithm for multi-objective reinforcement learning in non-stationary environments

Sherif Abdelfattah,Jiankun Hu,Kathryn Kasmarik

doi:10.1177/1059712319869313

Sherif Abdelfattah, Jiankun Hu + Show 1 more

Open Access

https://doi.org/10.1177/1059712319869313

Copy DOI

Journal: Adaptive Behavior	Publication Date: Aug 15, 2019
Citations: 3

Affiliation: University of Canberra

Abstract

Multi-objective Markov decision processes are a special kind of multi-objective optimization problem that involves sequential decision making while satisfying the Markov property of stochastic processes. Multi-objective reinforcement learning methods address this kind of problem by fusing the reinforcement learning paradigm with multi-objective optimization techniques. One major drawback of these methods is the lack of adaptability to non-stationary dynamics in the environment. This is because they adopt optimization procedures that assume stationarity in order to evolve a coverage set of policies that can solve the problem. This article introduces a developmental optimization approach that can evolve the policy coverage set while exploring the preference space over the defined objectives in an online manner. We propose a novel multi-objective reinforcement learning algorithm that can robustly evolve a convex coverage set of policies in an online manner in non-stationary environments. We compare the proposed algorithm with two state-of-the-art multi-objective reinforcement learning algorithms in stationary and non-stationary environments. Results showed that the proposed algorithm significantly outperforms the existing algorithms in non-stationary environments while achieving comparable results in stationary environments.

Full Text