Abstract

Target tracking using an unmanned aerial vehicle (UAV) is a challenging robotic problem. It requires handling a high level of nonlinearity and dynamics. Model-free control effectively handles the uncertain nature of the problem, and reinforcement learning (RL)-based approaches are a good candidate for solving this problem. In this article, the Twin Delayed Deep Deterministic Policy Gradient Algorithm (TD3), as recent and composite architecture of RL, was explored as a tracking agent for the UAV-based target tracking problem. Several improvements on the original TD3 were also performed. First, the proportional-differential controller was used to boost the exploration of the TD3 in training. Second, a novel reward formulation for the UAV-based target tracking enabled a careful combination of the various dynamic variables in the reward functions. This was accomplished by incorporating two exponential functions to limit the effect of velocity and acceleration to prevent the deformation in the policy function approximation. In addition, the concept of multistage training based on the dynamic variables was proposed as an opposing concept to one-stage combinatory training. Third, an enhancement of the rewarding function by including piecewise decomposition was used to enable more stable learning behaviour of the policy and move out from the linear reward to the achievement formula. The training was conducted based on fixed target tracking followed by moving target tracking. The flight testing was conducted based on three types of target trajectories: fixed, square, and blinking. The multistage training achieved the best performance with both exponential and achievement rewarding for the fixed trained agent with the fixed and square moving target and for the combined agent with both exponential and achievement rewarding for a fixed trained agent in the case of a blinking target. With respect to the traditional proportional differential controller, the maximum error reduction rate is 86%. The developed achievement rewarding and the multistage training opens the door to various applications of RL in target tracking.

Highlights

  • Unmanned aerial vehicle (UAV) applications are increasing day by day, and aerial vehicles are being used as part of many recent technological applications

  • It was observed that all MultilevelAchievement (MLA), Combined- AchievementExponential (CAE) and Combined-Achievement (CA) have provided much better performances than both proportional differential (PD) and Combined (C), which are just classical TD3-based models with no modifications

  • The ranges of errors provide that adding an achievement term to the TD3 is useful for improving the tracking performance and reducing the error. Combining both the achievement rewarding formula and the exponential weighting terms provides better performance than using the achievement rewarding alone. Another observation is that the width of the boxplot is reduced for the achievementbased agents, namely CA, CAE, MLA and Multilevel - AchievementExponential (MLAE), which means more stability in the performance when they are trained on a fixed target, i.e., F-agent

Read more

Summary

Introduction

Unmanned aerial vehicle (UAV) applications are increasing day by day, and aerial vehicles are being used as part of many recent technological applications. Some examples are in shipping [1], surveillance [2], [3], [4], battlefield [5], rescuing applications [6], [7], and inspection [8], [9]. Aerial vehicles are divided into three categories: teleoperated [10], [11], semi-autonomous [12], [13], and full autonomous [14]. Enabling aerial vehicle applications requires essential autonomous features with regard to autonomy within the system. Vehicles that can be autonomous must be able to decide on and react to events without direct intervention by humans. Some fundamental aspects are common to all autonomous vehicles. These aspects include sensing and perceiving the environment, analysing the gained information, VOLUME XX, 2017

Objectives
Methods
Results
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.