Output feedback Q-learning for discrete-time linear zero-sum games with application to the H-infinity control

Syed Ali Asad Rizvi,Zongli Lin

doi:10.1016/j.automatica.2018.05.027

Abstract

Approximate dynamic programming techniques usually rely on the feedback of the measurement of the complete state, which is generally not available in practical situations. In this paper, we present an output feedback Q-learning algorithm towards finding the optimal strategies for the discrete-time linear quadratic zero-sum game, which encompasses the H-infinity optimal control problem. A new representation of the Q-function in the output feedback form is derived for the zero-sum game problem and the optimal output feedback policies are presented. Then, a Q-learning algorithm is developed that learns the optimal control strategies online without needing any information about the system dynamics, which makes the control design completely model-free. It is shown that the proposed algorithm converges to the optimal solution obtained by solving the game algebraic Riccati equation (GARE). Unlike the value function based approach used for output feedback, the proposed Q-learning scheme does not require a discounting factor that is generally adopted to mitigate the effect of excitation noise bias. It is known that this discounting factor may compromise the closed-loop stability. The proposed method overcomes the excitation noise bias problem without resorting to the discounting factor, and therefore, converges to the nominal GARE solution. As a result, the closed-loop stability is preserved. An application to the H-infinity autopilot controller for the F-16 aircraft is demonstrated by simulation.

Full Text