Twin Delayed Deep Deterministic Policy Gradient-Based Target Tracking for Unmanned Aerial Vehicle With Achievement Rewarding and Multistage Training

Najmaddin Abo Mosali,Rosli Omar,Omar Alfandi,Syariful Syafiq Shamsudin,Najib Al-Fadhali

doi:10.1109/access.2022.3154388

Najmaddin Abo Mosali, Rosli Omar + Show 3 more

Open Access

https://doi.org/10.1109/access.2022.3154388

Copy DOI

Abstract

Target tracking using an unmanned aerial vehicle (UAV) is a challenging robotic problem. It requires handling a high level of nonlinearity and dynamics. Model-free control effectively handles the uncertain nature of the problem, and reinforcement learning (RL)-based approaches are a good candidate for solving this problem. In this article, the Twin Delayed Deep Deterministic Policy Gradient Algorithm (TD3), as recent and composite architecture of RL, was explored as a tracking agent for the UAV-based target tracking problem. Several improvements on the original TD3 were also performed. First, the proportional-differential controller was used to boost the exploration of the TD3 in training. Second, a novel reward formulation for the UAV-based target tracking enabled a careful combination of the various dynamic variables in the reward functions. This was accomplished by incorporating two exponential functions to limit the effect of velocity and acceleration to prevent the deformation in the policy function approximation. In addition, the concept of multistage training based on the dynamic variables was proposed as an opposing concept to one-stage combinatory training. Third, an enhancement of the rewarding function by including piecewise decomposition was used to enable more stable learning behaviour of the policy and move out from the linear reward to the achievement formula. The training was conducted based on fixed target tracking followed by moving target tracking. The flight testing was conducted based on three types of target trajectories: fixed, square, and blinking. The multistage training achieved the best performance with both exponential and achievement rewarding for the fixed trained agent with the fixed and square moving target and for the combined agent with both exponential and achievement rewarding for a fixed trained agent in the case of a blinking target. With respect to the traditional proportional differential controller, the maximum error reduction rate is 86%. The developed achievement rewarding and the multistage training opens the door to various applications of RL in target tracking.

Highlights

Unmanned aerial vehicle (UAV) applications are increasing day by day, and aerial vehicles are being used as part of many recent technological applications
It was observed that all MultilevelAchievement (MLA), Combined- AchievementExponential (CAE) and Combined-Achievement (CA) have provided much better performances than both proportional differential (PD) and Combined (C), which are just classical TD3-based models with no modifications
The ranges of errors provide that adding an achievement term to the TD3 is useful for improving the tracking performance and reducing the error. Combining both the achievement rewarding formula and the exponential weighting terms provides better performance than using the achievement rewarding alone. Another observation is that the width of the boxplot is reduced for the achievementbased agents, namely CA, CAE, MLA and Multilevel - AchievementExponential (MLAE), which means more stability in the performance when they are trained on a fixed target, i.e., F-agent

Summary

Introduction

Unmanned aerial vehicle (UAV) applications are increasing day by day, and aerial vehicles are being used as part of many recent technological applications. Some examples are in shipping [1], surveillance [2], [3], [4], battlefield [5], rescuing applications [6], [7], and inspection [8], [9]. Aerial vehicles are divided into three categories: teleoperated [10], [11], semi-autonomous [12], [13], and full autonomous [14]. Enabling aerial vehicle applications requires essential autonomous features with regard to autonomy within the system. Vehicles that can be autonomous must be able to decide on and react to events without direct intervention by humans. Some fundamental aspects are common to all autonomous vehicles. These aspects include sensing and perceiving the environment, analysing the gained information, VOLUME XX, 2017

Objectives

Methods

Results

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2022
Citations: 14	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Twin Delayed Deep Deterministic Policy Gradient-Based Target Tracking for Unmanned Aerial Vehicle With Achievement Rewarding and Multistage Training

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

A State-Compensated Deep Deterministic Policy Gradient Algorithm for UAV Trajectory Tracking
Jiying Wu ... Naifeng He
Machines | VOL. 10
Jiying Wu, et. al.Jiying Wu ... Naifeng He
21 Jun 2022
Machines | VOL. 10

Stability Analysis for Autonomous Vehicle Navigation Trained over Deep Deterministic Policy Gradient
Mireya Cabezas-Olivenza ... Ander Sanchez-Chica
Mathematics | VOL. 11
Mireya Cabezas-Olivenza, et. al.Mireya Cabezas-Olivenza ... Ander Sanchez-Chica
27 Dec 2022
Mathematics | VOL. 11

Relevant experience learning: A deep reinforcement learning method for UAV autonomous motion planning in complex unknown environments
Zijian Hu ... Qianglong Wang
Chinese Journal of Aeronautics | VOL. 34
Zijian Hu, et. al.Zijian Hu ... Qianglong Wang
12 Jan 2021
Chinese Journal of Aeronautics | VOL. 34

A DDPG Algorithm Based Reinforcement Learning Controller for Three-Phase DC-AC Inverters
Jian Ye ... Sen Mei
-
Jian Ye, et. al.Jian Ye ... Sen Mei
24 Feb 2023
24 Feb 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Twin Delayed Deep Deterministic Policy Gradient-Based Target Tracking for Unmanned Aerial Vehicle With Achievement Rewarding and Multistage Training

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access