Abstract

In recent years unmanned systems cluster collaborative applications have put forward higher requirements on reinforcement learning techniques. The increasing number of agents in multi-agent reinforcement learning leads to the effect of multi-agent reinforcement learning algorithms represented by Multi-Agent Deep Deterministic Policy Gradient (MADDPG) is not satisfactory, and the algorithm time consumption is difficult to predicate and control. To solve this problem, this paper proposes a hypothesis and verifies the possibility of state behavior value estimation by replacing global information with local information in two scenarios: predator-prey scenario and dual identity, and obtains better training results than MADDPG. Based on this, we further analyze the effectiveness of local information selection by distance selection method, type selection method and correlation class proximity method in food chain scenarios. The experiments show that using correlation class proximity selection method has significant improvement over MADDPG algorithm in terms of training effect and training time consumption. The method in this paper provides support for multi-agent reinforcement learning applied to large-scale unmanned system clustering problems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.