Abstract
Solving the Nash equilibrium is important for multi-agent game systems, and the speed of reaching Nash equilibrium is critical for the agent to quickly make real-time decisions. A typical scheme is the model-free reinforcement learning algorithm based on policy iteration, which is slow because each iteration will be calculated from the start state to the end state. In this paper, we propose a faster scheme based on value iteration, using Q-function in an online manner to solve the Nash equilibrium of the system. Since the calculation is based on the value from the last iteration, the convergence speed of the proposed scheme is much faster than the policy iteration. The rationality and convergence of this scheme are analyzed and proved theoretically. An actor-critic network structure is used to implement this scheme through simulation. The simulation results show that the convergence speed of our proposed scheme is about 10 times faster than that of the policy iteration algorithm.
Highlights
M ULTI-AGENT consensus research involves the knowledge, goals, skills, and planning of how to enable the agents to take coordinated actions to solve problems
We propose a value iteration algorithm to solve the Nash equilibrium for multi-agent game systems by designing a cooperative agent’s RL algorithm jointly using Q-function in an online manner
C) All agents are in Nash equilibrium, with Ji u∗i, u∗i ≤ Ji ui, u∗i [6], [25]
Summary
M ULTI-AGENT consensus research involves the knowledge, goals, skills, and planning of how to enable the agents to take coordinated actions to solve problems. The model-free algorithm is an important research direction for multi-agent systems used in unknown environments [20], [21]. We propose a value iteration algorithm to solve the Nash equilibrium for multi-agent game systems by designing a cooperative agent’s RL algorithm jointly using Q-function in an online manner. System matrix Input matrix of agent i Control input of agent i Control inputs of the neighbors of agent i Local neighborhood tracking error of agent i Local neighborhood tracking error of the neighbors of agent i Vector of εi and ε−i Synchronization error vector
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.