Abstract

Abstract This paper develops a novel off-policy game Q-learning algorithm to solve the anti-interference control problem for discrete-time linear multi-player systems using only data without requiring system matrices to be known. The primary contribution of this paper lies in that the Q-learning strategy employed in the proposed algorithm is implemented in an off-policy policy iteration approach other than on-policy learning due to the well-known advantages of off-policy Q-learning over on-policy Q-learning. All of the players work hard together for the goal of minimizing their common performance index meanwhile defeating the disturbance that tries to maximize the specific performance index, and finally they reach the Nash equilibrium of the game resulting in satisfying disturbance attenuation condition. In order to find the solution to the Nash equilibrium, the anti-interference control problem is first transformed into an optimal control problem. Then an off-policy Q-learning algorithm is proposed in the framework of typical adaptive dynamic programming (ADP) and game architecture, such that control policies of all players can be learned using only measured data. Comparative simulation results are provided to verify the effectiveness of the proposed method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call