Abstract

With the development of unmanned aerial vehicle (UAV) technology, UAV swarm confrontation has attracted many researchers’ attention. However, the situation faced by the UAV swarm has substantial uncertainty and dynamic variability. The state space and action space increase exponentially with the number of UAVs, so that autonomous decision-making becomes a difficult problem in the confrontation environment. In this paper, a multiagent reinforcement learning method with macro action and human expertise is proposed for autonomous decision-making of UAVs. In the proposed approach, UAV swarm is modeled as a large multiagent system (MAS) with an individual UAV as an agent, and the sequential decision-making problem in swarm confrontation is modeled as a Markov decision process. Agents in the proposed method are trained based on the macro actions, where sparse and delayed rewards, large state space, and action space are effectively overcome. The key to the success of this method is the generation of the macro actions that allow the high-level policy to find a near-optimal solution. In this paper, we further leverage human expertise to design a set of good macro actions. Extensive empirical experiments in our constructed swarm confrontation environment show that our method performs better than the other algorithms.

Highlights

  • unmanned aerial vehicle (UAV) have the advantages of low cost, flexible maneuverability, strong concealment, and ability to perform in harsh environments

  • We compare hierarchical multiagent deep deterministic policy gradient (hMADDPG), multiagent deep deterministic policy gradient (MADDPG), and independent DDPG (i-DDPG) in our constructed environment, where the allied UAVs are controlled by agents we trained, and the enemy UAVs are controlled by a heuristic rule to attack the nearest enemy within its attack range

  • The agent-specific global state required for MARL training is illustrated in Section 4.5, including each UAV’s heading, distance, relative position, and attacking angle to the specific UAV, while the local observation only includes the information of UAVs in the detection range

Read more

Summary

Introduction

UAVs have the advantages of low cost, flexible maneuverability, strong concealment, and ability to perform in harsh environments They are widely used in the military field and have played an essential role in many local wars. The use of multiple UAVs, which is in coordination through communication between UAVs, can expand the perception of the environmental situation, achieve coordinated task assignment, collaborative reconnaissance, and attack, and effectively improve the survivability and overall combat effectiveness [1]. This environment, where multiple UAVs cooperate with allies and compete with the enemies, naturally constitutes a multiagent system. In such a dynamically changing environment, training UAVs to complete complex tasks together has essential research significance

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.