Facing the implementation problems such like low growth reward, long training time, and poor stability of the multiagent learning methods when dealing with complex environment and more agents, this paper proposes a fast optimal coordination method for multiagent in complex environment (FOC-MACE). Firstly, the environment exploration strategy is introduced into the policy network based on the MADDPG method for higher growth rewards. Then, the parallel computing technology is adopted in the critic network, in purpose to effectively reduce the training time. These tactics together are beneficial to enhance the stability of multiagent learning. Lastly, the optimal resource allocation is carried out to realize optimal coevolution of the multiagents and further improve the learning ability of the agents’ group. To verify the effectiveness of our proposal, the FOC-MACE is compared with several advanced methods at current stage in the MPE environment. Three different experiments prove that by using our method, the growth reward is increased by up to 37.1%, the training is speed up significantly, and the stability of the method, which represented by standardized variance, is also improved. In addition, this paper validated the fast optimal coordination method for multiagent systems in the context of UAV scenarios, demonstrating the practical performance of the approach. Through comprehensive experiments and scenario validations, the study successfully confirmed the effectiveness of the proposed fast optimal coordination method for multiagent systems in complex environments.
Read full abstract