Abstract

In recent years unmanned systems cluster collaborative applications have put forward higher requirements on reinforcement learning techniques. The increasing number of agents in multi-agent reinforcement learning leads to the effect of multi-agent reinforcement learning algorithms represented by Multi-Agent Deep Deterministic Policy Gradient (MADDPG) is not satisfactory, and the algorithm time consumption is difficult to predicate and control. To solve this problem, this paper proposes a hypothesis and verifies the possibility of state behavior value estimation by replacing global information with local information in two scenarios: predator-prey scenario and dual identity, and obtains better training results than MADDPG. Based on this, we further analyze the effectiveness of local information selection by distance selection method, type selection method and correlation class proximity method in food chain scenarios. The experiments show that using correlation class proximity selection method has significant improvement over MADDPG algorithm in terms of training effect and training time consumption. The method in this paper provides support for multi-agent reinforcement learning applied to large-scale unmanned system clustering problems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call