Abstract

In this article, we investigate how multiple agents learn to coordinate to form efficient exploration in reinforcement learning. Though straightforward, independent exploration of the joint action space of multiple agents will become exponentially more difficult as the number of agents increases. To tackle this problem, we propose feudal latent-space exploration (FLE) for multi-agent reinforcement learning (MARL). FLE introduces a feudal commander to learn a low-dimensional global latent structure that instructs multiple agents to explore coordinately. Under this framework, the multi-agent policy gradient (PG) is adopted to optimize both the agent policy and latent structure end-to-end. We demonstrate the effectiveness of this method in two multi-agent environments that need explicit coordination. Experimental results validate that FLE outperforms baseline MARL approaches that use independent exploration strategy in terms of mean rewards, efficiency, and the expressiveness of coordination policies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call