Abstract

This article presents an online off-policy policy iteration (PI) algorithm using reinforcement learning (RL) to optimize the distributed synchronization problem for nonlinear multiagent systems (MASs). First, considering that not every follower can directly obtain the leader's information, a novel adaptive model-free observer based on neural networks (NNs) is designed. Moreover the feasibility of the observer is strictly proved. Subsequently, combined with the observer and follower dynamics, an augmented system and a distributed cooperative performance index with discount factors are established. On this basis, the optimal distributed cooperative synchronization problem changes into solving the numerical solution of the Hamilton-Jacobian-Bellman (HJB) equation. Finally, an online off-policy algorithm is proposed, which can be used to optimize the distributed synchronization problem of the MASs in real time based on measured data. In order to prove the stability and convergence of the online off-policy algorithm more conveniently, an offline on-policy algorithm whose stability and convergence are proved is given before the online off-policy algorithm is proposed. We give a novel mathematical analysis method for establishing the stability of the algorithm. The effectiveness of the theory is verified by simulation results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call