Abstract

In this paper, we focus on the study of UAV ground target tracking under obstacle environments using deep reinforcement learning, and an improved deep deterministic policy gradient (DDPG) algorithm is presented. A reward function based on line of sight and artificial potential field is constructed to guide the behavior of UAV to achieve target tracking, and a penalty term of action makes the trajectory smooth. In order to improve the exploration ability, multiple UAVs, which controlled by the same policy network, are used to perform tasks in each episode. Taking into account that the history observations have a great degree of correlation with the policy, long short-term memory networks are used to approximate the state of environments, which improve the approximation accuracy and the efficiency of data utilization. The simulation results show that the propose method can make the UAV keep target tracking and obstacle avoidance effectively.

Highlights

  • Unmanned aerial vehicles (UAVs) have the advantages of safety, low cost and high maneuverability

  • The improved Deep deterministic policy gradient (DDPG) algorithm is trained in a virtual simulation environment, and the well-trained algorithm can be used for online target tracking and obstacle avoidance in new dynamic environments

  • BACKGROUND we give an introduction to the background knowledge of DRL ground target tracking and obstacle avoidance, including a DRL algorithm – DDPG and the environment used for tracking

Read more

Summary

INTRODUCTION

Unmanned aerial vehicles (UAVs) have the advantages of safety, low cost and high maneuverability. High autonomy on-line trajectory planning of UAV for target tracking and obstacle avoidance in unknown working environment arosed great attentions [2]–[4]. Deep deterministic policy gradient (DDPG) [27] is a DRL algorithm which combines DQN with actor-critic and can be operated in continuous action space. Dynamic and partially observable environments are major challenges for UAV target tracking [26] To overcome these difficulties, we improve DDPG in terms of reward function and data. The improved DDPG algorithm is trained in a virtual simulation environment, and the well-trained algorithm can be used for online target tracking and obstacle avoidance in new dynamic environments. Including the DDPG algorithm, the ground target tracking environment, the kinetic models of UAV, observation space and action space.

BACKGROUND
ENVIRONMENTS
OBSERVATION AND ACTION SPACE
REWARD FUNCTION
EXPERIMENTS
EXPERIMENT RESULT
Findings
CONCLUSION AND PROSPECT
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call