Abstract

This paper proposes a novel multiagent reinforcement learning (MARL) algorithm Nash- learning with regret matching, in which regret matching is used to speed up the well-known MARL algorithm Nash- learning. It is critical that choosing a suitable strategy for action selection to harmonize the relation between exploration and exploitation to enhance the ability of online learning for Nash- learning. In Markov Game the joint action of agents adopting regret matching algorithm can converge to a group of points of no-regret that can be viewed as coarse correlated equilibrium which includes Nash equilibrium in essence. It is can be inferred that regret matching can guide exploration of the state-action space so that the rate of convergence of Nash- learning algorithm can be increased. Simulation results on robot soccer validate that compared to original Nash- learning algorithm, the use of regret matching during the learning phase of Nash- learning has excellent ability of online learning and results in significant performance in terms of scores, average reward and policy convergence.

Highlights

  • Multi-robot system (MRS) has received more and more attention because of its broad application prospect, which has several research platforms including formation [1], foraging [2], prey-pursuing [3, 4], and robot soccer [5,6,7]

  • This paper proposes a novel multiagent reinforcement learning (MARL) algorithm Nash-Q learning with regret matching, in which regret matching is used to speed up the well-known MARL algorithm Nash-Q learning

  • For MRS action selection of the learning robot is unavoidably affected by actions of other agents, so multiagent reinforcement learning (MARL) involving joint state and joint action is more suitable and promising method for MRS [13,14,15,16]

Read more

Summary

Introduction

Multi-robot system (MRS) has received more and more attention because of its broad application prospect, which has several research platforms including formation [1], foraging [2], prey-pursuing [3, 4], and robot soccer [5,6,7]. Agents adopting the above algorithms can be called equilibrium learners [17, 20, 21], which is one method of handling the loss of stationarity of MDP These algorithms learn joint action values which are stationary and in certain circumstances guarantee that these values can converge to Nash equilibrium (NE) values [22] or correlated equilibrium (CE) values. Regret matching [25] belonging to no-regret algorithms guarantees that the joint action will asymptotically converge to a set of points of no-regret that can be referred to as coarse correlated equilibrium in MG [28]. Because Nash equilibrium is coarse correlated equilibrium [28], it can be inferred that regret matching that leads joint action to points of coarse correlated equilibrium can effectively improve the convergence rate of original Nash-Q learning algorithm.

Multiagent Reinforcement Learning and Nash-Q Learning
Regret Matching Algorithm for Action Selection
Action-Based Soccer Robot
Simulation
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call