Abstract

Deep deterministic policy gradient (DDPG) algorithm is a reinforcement learning method, which has been widely used in UAV path planning. However, the critic network of DDPG is frequently updated in the training process. It leads to an inevitable overestimation problem and increases the training computational complexity. Therefore, this paper presents a multicritic-delayed DDPG method for solving the UAV path planning. It uses multicritic networks and delayed learning methods to reduce the overestimation problem of DDPG and adds noise to improve the robustness in the real environment. Moreover, a UAV mission platform is built to train and evaluate the effectiveness and robustness of the proposed method. Simulation results show that the proposed algorithm has a higher convergence speed, a better convergence effect, and stability. It indicates that UAV can learn more knowledge from the complex environment.

Highlights

  • deep Q-network (DQN) enhances the stability of training process by using experiential replay memory and target network

  • Lillicrap et al [31] proposed a deep deterministic policy gradient (DDPG) algorithm to improve the stability of A-C algorithm evaluation by using the target network and empirical replay of DQN

  • In order to solve the problem that actor network relies heavily on critic network, which makes Deep deterministic policy gradient (DDPG) performance very sensitive to critic learning, this paper proposes a multicritic-delayed DDPG method for solving UAV path planning

Read more

Summary

Annotation Demo Section

In recent years, unmanned aerial vehicles (UAVs) have been widely applied, and their high maneuverability and rapidly deployable UAVs have been applied to search and rescue [1], multi-UAV cooperation [2], formation flight [3], remote surveillance [4], and other fields [5–7]. Reinforcement learning is independent of environmental models and prior knowledge It can effectively solve the UAV path planning problem in unknown environments. In order to solve the problem that actor network relies heavily on critic network, which makes DDPG performance very sensitive to critic learning, this paper proposes a multicritic-delayed DDPG method for solving UAV path planning. The second is to use multicritic to average error and solve the error accumulation caused by overestimation (2) We apply the proposed multicritic-delayed deep deterministic policy gradient method for solving UAV path planning. Simulation results show that the proposed algorithm is effective with strong robust and adaptive capability for solving the path planning of the UAV flying destination under complex environment.

UAV Motion Model
Reinforcement Learning
Nonsparse Reward Model
Deep Deterministic Policy Gradient
Twin-Delayed Deep Deterministic
Multicritic-Delayed DDPG Method
Experimental Platform Setting
Performance of Multicritic Delayed
Testing of Different Algorithms
Testing of Complex Environment
Findings
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.