Abstract

A novel two-dimensional (2D) off-policy interleaved Q-learning algorithm is proposed to handle the optimal tracking control problem without prior knowledge of nonlinear batch processes and an initial control policy, which overcomes the drawback that the system dynamic parameters change intermittently and the difficulty of obtaining the initial parameters, greatly reducing the computational difficulty of the optimal policy. Consequently, three-layer neural networks, including the model network, the critic network and the action network are designed as the approximate parameter structure to search for a control policy via the 2D off-policy interleaved Q-learning algorithm. The weights in each layer of the neural network are continuously learned and renewed by historical data in both time and batch directions in order to obtain the optimal control policy. After that, the convergence and optimality are scrupulously verified. Ultimately, the simulation results of the injection stage confirmed the validity and feasibility of the proposed algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call