Abstract

AbstractThis paper presents a novel adaptive dynamic programming (ADP) method to solve the optimal consensus problem for a class of discrete‐time multi‐agent systems with completely unknown dynamics. Different from the classical RL‐based optimal control algorithms based on one‐step temporal difference method, a multi‐step‐based (also call n‐step) policy gradient ADP (MS‐PGADP) algorithm, which have been proved to be more efficient owing to its faster propagation of the reward, is proposed to obtain the iterative control policies. Moreover, a novel Q‐function is defined, which estimates the performance of performing an action in the current state. Then, through the Lyapunov stability theorem and functional analysis, the proof of optimality of the performance index function is given and the stability of the error system is also proved. Furthermore, the actor‐critic neural networks are used to implement the proposed method. Inspired by deep Q network, the target network is also introduced to guarantee the stability of NNs in the process of training. Finally, two simulations are conducted to verify the effectiveness of the proposed algorithm.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.