Abstract

Multi-agent Reinforcement Learning (MARL) is a machine learning method that solves problems by using multiple learning agents in a data-driven manner. Because of the advantage of utilizing multiple agents simultaneously, MARL has become an efficient solution to large-scale problems in a wide range of fields. However, as with general single-agent reinforcement learning, MARL requires trial and error to acquire the appropriate policies for each agent in the learning process. Therefore, how to guarantee performance and constraint satisfaction in MARL is a critical issue for application to real-world problems. In this study, we propose an Information-sharing Constrained Policy Optimization (IsCPO) method for MARL that guarantees constraint satisfaction during learning. In detail, IsCPO sequentially updates the policies of multiple agents in random order while sharing information of the surrogate costs and KL-divergence for evaluating the current and updated policies to the next agent. In addition, if there are no candidates of policies to be updated in accordance with the shared information, IsCPO skips updating the policies of the rest of the agents until the next iteration. As a result, IsCPO makes it possible to acquire the individual suboptimal policies of agents, satisfying constraints on global costs related to the state of the environment and the actions from multiple agents. We also introduce a practical algorithm for IsCPO that simplifies its implementation by adopting several mathematical approximations. Finally, we show the validity and effectiveness through simulation results on a multiple cart-pole problem and base station sleep control problem in a mobile network.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call