Novel two-dimensional off-policy [formula omitted]-learning method for output feedback optimal tracking control of batch process with unknown dynamics

Huiyuan Shi,Chen Yang,Xueying Jiang,Chengli Su,Ping Li

doi:10.1016/j.jprocont.2022.03.006

Abstract

Reinforcement learning (RL) is an artificial intelligence algorithm that can learn adaptive optimal control law online. In view of the fact that the previous control approaches were usually overly dependent on the model parameters of system, and most existing RL methods are based on state feedback, their application in actual industrial production is limited. Additionally, developing accurate process system models and ensuring the closed-loop system’s control performance is more challenging, as modern businesses place a premium on product quality and economic efficiency. As a result, this work introduces a novel data-driven two-dimensional (2D) off-policy Q-learning method based on output feedback is used to achieve optimal tracking control for batch process. First, the error between the actual output and the given set-point is extended to the system to ensure the good tracking performance. Second, by analyzing the relationship between the value function and the Q-function obtained from the 2D system’s performance index, the 2D Bellman equation is obtained in terms of output feedback that is independent of the model parameters. The optimal control problem can be effectively solved by the proposed method in this paper when the policy iteration is executed using only the measurement data of system along the batch and time directions. Following that, the proposed approach’s unbiasedness and convergence are strictly confirmed. Finally, the simulation results for the injection molding process demonstrate that the proposed method is capable of determining the optimal control law as the number of batches is growing increasingly.

Full Text