Abstract

The UAV pursuit-evasion strategy based on Deep Deterministic Policy Gradient (DDPG) algorithm is a current research hotspot. However, this algorithm has the defect of low efficiency in sample exploration. To solve this problem, this paper uses the imitation learning (IL) to improve the DDPG exploration strategy. A kind of quasiproportional guidance control law is designed to generate effective learning samples, which are used as the data of the initial experience pool of DDPG algorithm. The UAV pursuit-evasion strategy based on DDPG and imitation learning (IL-DDPG) is proposed, and the algorithm obtains the data from the experience pool for experience playback learning, which improves the exploration efficiency of the algorithm in the initial stage of training and avoids the problem of too many useless exploration in the training process. The simulation results show that the trained pursuit-UAV can flexibly adjust the flight speed and flight attitude to pursuit the evasion-UAV quickly. It also verifies that the improved DDPG algorithm is more effective than the basic DDPG algorithm to improve the training efficiency.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call