Abstract

Abstract Model-free reinforcement learning (RL) learns an optimal control policy by using the process data only. However, simple application of model-free RL to a practical process has a high risk of failure because the available amount of data and the number of trial runs are limited. Moreover, it is likely that state constraints are violated during the learning period. In this work, we propose Q-MPC framework, an integrated algorithm of RL and model predictive control (MPC) for safe learning. The Q-MPC learns the action-value function in an off7-policy fashion and solves a model-based optimal control problem where the trained action-value function is assigned as the terminal cost. Because the Q-MPC utilizes a model, the state constraints can be respected during the learning period. For simulation study, Q-MPC, MPC, and double deep Q-network (DDQN) were applied with varying prediction horizons. The results show the advantages of Q-MPC that outperforms MPC by reducing the model-plant mismatch and shows much fewer constraint violations than DDQN.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call