Abstract

In the framework of adaptive dynamic programming combined with Q-learning, this paper investigates networked multi-player games, in which the common state of the plant is transmitted to all players via a network, for finding the Nash equilibrium solution without requiring the system matrices to be known, even though there exists network-induced delay and system state cannot be directly measured. By adding an observer and a virtual Smith predictor for estimating system state and predicting system state, the control policies of players can be successfully designed. Then, a novel off-policy Q-learning algorithm is proposed to learn the Nash equilibrium solution via solving the coupled algebraic Riccati equations using available data, followed by the rigorous proof of convergence of the proposed algorithm. Finally, an example is given to show the effectiveness of the proposed method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call