An experimental study of two predictive reinforcement learning methods and comparison with model-predictive control

Dmitrii Dobriborsci,Pavel Osinenko,Wolfgang Aumer

doi:10.1016/j.ifacol.2022.09.610

Dmitrii Dobriborsci, Pavel Osinenko + Show 1 more

Open Access

https://doi.org/10.1016/j.ifacol.2022.09.610

Copy DOI

Abstract

Reinforcement learning (RL) has been successfully used in various simulations and computer games. Industry-related applications, such as autonomous mobile robot motion control, are somewhat challenging for RL up to date though. This paper presents an experimental evaluation of predictive RL controllers for optimal mobile robot motion control. As a baseline for comparison, model-predictive control (MPC) is used. Two RL methods are tested: a rollout Q-learning, which may be considered as MPC with terminal cost being a Q-function approximation, and a so-called stacked Q-learning, which in turn is like MPC with the running cost substituted for a Q-function approximation. The experimental foundation is a mobile robot with a differential drive (Robotis Turtlebot3). Experimental results showed that both RL methods beat the baseline in terms of the accumulated cost, whereas the stacked variant performed best. Provided the series of previous works on stacked Q-learning, this particular study supports the idea that MPC with a running cost adaptation inspired by Q-learning possesses potential of performance boost while retaining the nice properties of MPC.

Full Text