Experience replay–based output feedback Q‐learning scheme for optimal output tracking control of discrete‐time linear systems

Syed Ali Asad Rizvi,Zongli Lin

doi:10.1002/acs.2981

Abstract

SummaryThis paper focuses on solving the adaptive optimal tracking control problem for discrete‐time linear systems with unknown system dynamics using output feedback. A Q‐learning‐based optimal adaptive control scheme is presented to learn the feedback and feedforward control parameters of the optimal tracking control law. The optimal feedback parameters are learned using the proposed output feedback Q‐learning Bellman equation, whereas the estimation of the optimal feedforward control parameters is achieved using an adaptive algorithm that guarantees convergence to zero of the tracking error. The proposed method has the advantage that it is not affected by the exploration noise bias problem and does not require a discounting factor, relieving the two bottlenecks in the past works in achieving stability guarantee and optimal asymptotic tracking. Furthermore, the proposed scheme employs the experience replay technique for data‐driven learning, which is data efficient and relaxes the persistence of excitation requirement in learning the feedback control parameters. It is shown that the learned feedback control parameters converge to the optimal solution of the Riccati equation and the feedforward control parameters converge to the solution of the Sylvester equation. Simulation studies on two practical systems have been carried out to show the effectiveness of the proposed scheme.

Full Text