Abstract

This paper investigates finite-horizon optimal consensus control problem for unknown multiagent systems with state delays. It is well known that optimal consensus control is the solutions to the coupled Hamilton-Jacobi-Bellman (HJB) equations. An off-policy reinforcement learning (RL) algorithm is developed to learn the two-stage optimal consensus solutions to the coupled time-varying HJB equations using the measurable state data instead of the knowledge of the state-delayed system dynamics. Subsequently, for each agent, a single critic neural network (NN) is utilized to approximate the time-varying cost function and help to calculate optimal consensus control policy. Based on the method of weighted residuals, adaptive weight update laws for the critic NNs are proposed. Finally, the simulation results are provided to illustrate the effectiveness of the proposed off-policy RL method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call