This paper discusses the topic of optimal output consensus control of linear time-invariant discrete-time multi-agent systems (MASs). The attainment of optimal output consensus control for MASs hinges upon the resolution of the interconnected Hamilton–Jacobi-Bellman equation, a task typically precluded by the inherent intractability of analytical solutions. Furthermore, most real-world systems are too complex to obtain the internal states of systems. To address these issues, a modified deep Q-learning network is constructed using current and historical system data rather than a precise model of the system. First, reconstructing the internal state of each agent using an adaptive distributed observer based on output feedback prevents the system instability brought on by the augmented systems. Then the local error system of the agent can be redefined. Based on the redefined error system, a data-driven adaptive dynamic programming (ADP) method is introduced, realized by using the actor–critic neural network structure. In addition, an experience replay strategy is proposed to reduce the propagation of estimation bias and improve the learning speed. Finally, the comparative numerical simulations substantiate the efficacy of the proposed algorithm in a quantifiable manner.
Read full abstract