UAV Path Planning Based on Multicritic-Delayed Deep Deterministic Policy Gradient

Runjia Wu,Hongjian Shi,Fangqing Gu,Hai-Lin Liu

doi:10.1155/2022/9017079

Abstract

Deep deterministic policy gradient (DDPG) algorithm is a reinforcement learning method, which has been widely used in UAV path planning. However, the critic network of DDPG is frequently updated in the training process. It leads to an inevitable overestimation problem and increases the training computational complexity. Therefore, this paper presents a multicritic-delayed DDPG method for solving the UAV path planning. It uses multicritic networks and delayed learning methods to reduce the overestimation problem of DDPG and adds noise to improve the robustness in the real environment. Moreover, a UAV mission platform is built to train and evaluate the effectiveness and robustness of the proposed method. Simulation results show that the proposed algorithm has a higher convergence speed, a better convergence effect, and stability. It indicates that UAV can learn more knowledge from the complex environment.

Highlights

deep Q-network (DQN) enhances the stability of training process by using experiential replay memory and target network
Lillicrap et al [31] proposed a deep deterministic policy gradient (DDPG) algorithm to improve the stability of A-C algorithm evaluation by using the target network and empirical replay of DQN
In order to solve the problem that actor network relies heavily on critic network, which makes Deep deterministic policy gradient (DDPG) performance very sensitive to critic learning, this paper proposes a multicritic-delayed DDPG method for solving UAV path planning

Summary

Annotation Demo Section

In recent years, unmanned aerial vehicles (UAVs) have been widely applied, and their high maneuverability and rapidly deployable UAVs have been applied to search and rescue [1], multi-UAV cooperation [2], formation flight [3], remote surveillance [4], and other fields [5–7]. Reinforcement learning is independent of environmental models and prior knowledge It can effectively solve the UAV path planning problem in unknown environments. In order to solve the problem that actor network relies heavily on critic network, which makes DDPG performance very sensitive to critic learning, this paper proposes a multicritic-delayed DDPG method for solving UAV path planning. The second is to use multicritic to average error and solve the error accumulation caused by overestimation (2) We apply the proposed multicritic-delayed deep deterministic policy gradient method for solving UAV path planning. Simulation results show that the proposed algorithm is effective with strong robust and adaptive capability for solving the path planning of the UAV flying destination under complex environment.

UAV Motion Model

Reinforcement Learning

Nonsparse Reward Model

Deep Deterministic Policy Gradient

Twin-Delayed Deep Deterministic

Multicritic-Delayed DDPG Method

Experimental Platform Setting

Performance of Multicritic Delayed

Testing of Different Algorithms

Testing of Complex Environment

Findings

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Wireless Communications and Mobile Computing	Publication Date: Mar 14, 2022
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

UAV Path Planning Based on Multicritic-Delayed Deep Deterministic Policy Gradient

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Wireless Communications and Mobile Computing

Lead the way for us

Similar Papers

A State-Compensated Deep Deterministic Policy Gradient Algorithm for UAV Trajectory Tracking
Jiying Wu ... Luwei Liao
Machines | VOL. 10
Jiying Wu, et. al.Jiying Wu ... Luwei Liao
21 Jun 2022
Machines | VOL. 10

Control and Simulation of a 6-DOF Biped Robot based on Twin Delayed Deep Deterministic Policy Gradient Algorithm
Phan Bui Khoi ... Nguyen Truong Giang
Indian Journal of Science and Technology | VOL. 14
Phan Bui Khoi, et. al.Phan Bui Khoi ... Nguyen Truong Giang
14 Jul 2021
Indian Journal of Science and Technology | VOL. 14

UAV maneuvering decision -making algorithm based on Twin Delayed Deep Deterministic Policy Gradient Algorithm
Shuangxia Bai ... Evgeny Neretin
Journal of Artificial Intelligence and Technology | VOL. -
Shuangxia Bai, et. al.Shuangxia Bai ... Evgeny Neretin
07 Dec 2021
Journal of Artificial Intelligence and Technology | VOL. -

Research on Multi-Robot Formation Control Based on MATD3 Algorithm
Conghang Zhou ... Zhirui Lin
Applied Sciences | VOL. 13
Conghang Zhou, et. al.Conghang Zhou ... Zhirui Lin
31 Jan 2023
Applied Sciences | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

UAV Path Planning Based on Multicritic-Delayed Deep Deterministic Policy Gradient

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Wireless Communications and Mobile Computing