Improving decision-making in the autonomous maneuvering of unmanned aerial vehicles (UAVs) is of great significance to improving flight safety, the mission execution rate, and environmental adaptability. The method of deep reinforcement learning makes the autonomous maneuvering decision of UAVs possible. However, the current algorithm is prone to low training efficiency and poor performance when dealing with complex continuous maneuvering problems. In order to further improve the autonomous maneuvering level of UAVs and explore safe and efficient maneuvering methods in complex environments, a maneuvering decision-making method based on hierarchical reinforcement learning and Proximal Policy Optimization (PPO) is proposed in this paper. By introducing the idea of hierarchical reinforcement learning into the PPO algorithm, the complex problem of UAV maneuvering and obstacle avoidance is separated into high-level macro-maneuver guidance and low-level micro-action execution, greatly simplifying the task of addressing complex maneuvering decisions using a single-layer PPO. In addition, by designing static/dynamic threat zones and varying their quantity, size, and location, the complexity of the environment is enhanced, thereby improving the algorithm’s adaptability and robustness to different conditions. The experimental results indicate that when the number of threat targets is five, the success rate of the H-PPO algorithm for maneuvering to the designated target point is 80%, which is significantly higher than the 58% rate achieved by the original PPO algorithm. Additionally, both the average maneuvering distance and time are lower than those of the PPO, and the network computation time is only 1.64 s, which is shorter than the 2.46 s computation time of the PPO. Additionally, as the complexity of the environment increases, the H-PPO algorithm outperforms other compared networks, demonstrating the effectiveness of the algorithm constructed in this paper for guiding intelligent agents to autonomously maneuver and avoid obstacles in complex and time-varying environments. This provides a feasible technical approach and theoretical support for realizing autonomous maneuvering decisions in UAVs.
Read full abstract