Continuous reinforcement learning to adapt multi-objective optimization online for robot motion

Kai Zhang,Jing Xiao,Sterling McLeod,Minwoo Lee

doi:10.1177/1729881420911491

Abstract

This article introduces a continuous reinforcement learning framework to enable online adaptation of multi-objective optimization functions for guiding a mobile robot to move in changing dynamic environments. The robot with this framework can continuously learn from multiple or changing environments where it encounters different numbers of obstacles moving in unknown ways at different times. Using both planned trajectories from a real-time motion planner and already executed trajectories as feedback observations, our reinforcement learning agent enables the robot to adapt motion behaviors to environmental changes. The agent contains a Q network connected to a long short-term memory network. The proposed framework is tested in both simulations and real robot experiments over various, dynamically varied task environments. The results show the efficacy of online continuous reinforcement learning for quick adaption to different, unknown, and dynamic environments.

Highlights

Real-time motion planning of robots often needs to consider multiple and sometimes conflicting optimization criteria, such as time efficiency, safety, and energy efficiency.[1,2,3] A common practice is to combine these criteria in a cost function as a weighted sum
The main contribution of this article is that we propose to tackle both problems by continuously training a reinforcement learning (RL) agent in different environments, even during the test
The weights of the target network are updated to learn how to adjust the coefficients of a multi-objective cost function in a real-time motion planner, and the weights of the online network are updated to enable the agent to keep learning from different kinds of environments continuously

Summary

Introduction

Real-time motion planning of robots often needs to consider multiple and sometimes conflicting optimization criteria, such as time efficiency (in terms of the shortest distance or time), safety (in terms of the clearance to obstacles), and energy efficiency.[1,2,3] A common practice is to combine these criteria in a cost function as a weighted sum. There are two related open problems: (1) how to determine values for coefficients of a compound optimization function automatically and (2) how to make the coefficients self-adapt to environmental changes. The main contribution of this article is that we propose to tackle both problems by continuously training a reinforcement learning (RL) agent in different environments, even during the test. The agent is trained to adjust the values of the coefficients of a multi-objective optimization function based on the robot’s performance in an environment with unknown dynamic obstacles, and the agent keeps learning by itself to best adapt to all kinds of environmental changes continuously. A convolutional neural network (NN) that is used to extract observation features can be removed from the agent This simplifies the formulation of the agent and enables continuous learning. Learning continuously means that[5,6,7] the agent can accumulate the knowledge learned in the past environments to help future learning and problem-solving and that later learning does not degrade much its performance in task environments learned earlier

Related work

Results

Conclusions