Abstract
Plain reinforcement learning (RL) may be prone to loss of convergence, constraint violation, unexpected performance, etc. Commonly, RL agents undergo extensive learning stages to achieve proper functionality. This is in contrast to classical control algorithms, which are typically model-based. A direction of research is the fusion of RL with such algorithms, especially model-predictive control (MPC). This, however, introduces new hyper-parameters related to the prediction horizon. Furthermore, RL is usually concerned with Markov decision processes. Nevertheless, most of the real environments are not time-discrete. The factual physical setting of RL consists of a digital agent and a time-continuous dynamical system. There is thus, in fact, yet another hyper-parameter – the agent sampling time. In this paper, we investigate the effects of prediction horizon and sampling of two hybrid RL-MPC agents in a case study with a mobile robot parking, which is, in turn, a canonical control problem. We benchmark the agents with a simple variant of MPC. The sampling showed a “sweet spot” behavior, whereas the RL agents demonstrated merits at shorter horizons.
Highlights
R EINFORCEMENT Learning (RL) shows remarkable performance in playground settings of video- and table games such as Starcraft, chess and Go [1]–[3]
Industry-close applications appear more challenging to RL due to the lack of freedom in training [4]–[8]
Industry is dominated by classical control-theoretic methods such as model-predictive control (MPC) [11]–[13]
Summary
R EINFORCEMENT Learning (RL) shows remarkable performance in playground settings of video- and table games such as Starcraft, chess and Go [1]–[3]. Industry-close applications appear more challenging to RL due to the lack of freedom in training [4]–[8]. This may be related to limited resources and technical constraints. Industry is dominated by classical control-theoretic methods such as model-predictive control (MPC) [11]–[13]. Somewhat in contrast to the classical control, RL is aimed at a learning-based, model-free (in some configurations) approach. It is perhaps the model-based formal guarantees that make classical control attractive to the industry. This work goes along the lines of fusion of RL with predictive controls and addresses the tuning of the latter
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.