Most multi-agent reinforcement learning (MARL) approaches optimize strategy by improving itself, while ignoring the limitations of homogeneous agents that may have single function. However, in reality, the complex tasks tend to coordinate various types of agents and leverage advantages from one another. Therefore, it is a vital research issue how to establish appropriate communication among them and optimize decision. To this end, we propose a Hierarchical Attention Master–Slave (HAMS) MARL, where the Hierarchical Attention balances the weight allocation within and among clusters, and the Master–Slave architecture endows agents independent reasoning and individual guidance. By the offered design, information fusion, especially among clusters, is implemented effectively, and excessive communication is avoided, moreover, selective composed action optimizes decision. We evaluate the HAMS on both small and large scale heterogeneous StarCraft II micromanagement tasks. The proposed algorithm achieves the exceptional performance with more than 80% win rates in all evaluation scenarios, which obtains an impressive win rate of over 90% in the largest map. The experiments demonstrate a maximum improvement in win rate of 47% over the best known algorithm. The results show that our proposal outperforms recent state-of-the-art approaches, which provides a novel idea for heterogeneous multi-agent policy optimization.
Read full abstract