Abstract
There is a growing interest in developing an efficient data-driven control method that can be implemented into digitized manufacturing processes. Model-free reinforcement learning (RL) is a machine learning method that can directly learn the optimal control policy from the process data. However, the model-free RL shows higher cost variance than the model-based method and may require an infeasible amount of data to learn the optimal control policy. Motivated by the fact that the system identification to linear model shows high data efficiency and stable performance, this paper proposes combining the linear model predictive control (MPC) with Q-learning. This combined scheme, Q-MPC, can improve the control performance more stably and safely. For the case study, linear MPC, Q-MPC, DDPG, TD3, and SAC methods are applied to the nonlinear benchmark system, mainly focusing on the learning speed and cost variance.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have