Abstract

Multi-agent reinforcement learning (MARL) algorithms have made great achievements in various scenarios, but there are still many problems in solving sequential social dilemmas (SSDs). In SSDs, the agent’s actions not only change the instantaneous state of the environment but also affect the latent state which will, in turn, affect all agents. However, most of the current reinforcement learning algorithms focus on analyzing the value of instantaneous environment state while ignoring the study of the latent state, which is very important for establishing cooperation. Therefore, we propose a novel counterfactual reasoning-based multi-agent reinforcement learning algorithm to evaluate the continuous contribution of agent actions on the latent state. We compute that using simulation reasoning and building an action evaluation network. Then through counterfactual reasoning, we can get a single agent’s influence on the environment. Using this continuous contribution as an intrinsic reward enables the agent to consider the collective, thereby promoting cooperation. We conduct experiments in the SSDs environment, and the results show that the collective reward is increased by at least 25% which demonstrates the excellent performance of our proposed algorithm compared to the state-of-the-art algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call