A Multi-Dimensional Goal Aircraft Guidance Approach Based on Reinforcement Learning with a Reward Shaping Algorithm.

Wenqiang Zu,Renyu Liu,Hongyu Yang,Yulong Ji

doi:10.3390/s21165643

Abstract

Guiding an aircraft to 4D waypoints at a certain heading is a multi-dimensional goal aircraft guidance problem. In order to improve the performance and solve this problem, this paper proposes a multi-layer RL approach. The approach enables the autopilot in an ATC simulator to guide an aircraft to 4D waypoints at certain latitude, longitude, altitude, heading, and arrival time, respectively. To be specific, a multi-layer RL approach is proposed to simplify the neural network structure and reduce the state dimensions. A shaped reward function that involves the potential function and Dubins path method is applied. Experimental and simulation results show that the proposed approach can significantly improve the convergence efficiency and trajectory performance. Furthermore, the results indicate possible application prospects in team aircraft guidance tasks, since the aircraft can directly approach a goal without waiting in a specific pattern, thereby overcoming the problem of current ATC simulators.

Highlights

The remainder of the present study is organized as follows: in Section 2, the background concepts on Dubins path and reinforcement learning (RL) are introduced, along with the variants used in the present work; in Section 3, the RL formulation of the aircraft guidance task is presented; in Section 4, the environment settings and structure of model are introduced in detail; in Section 5, numerical simulation results and discussion are given; and, in Section 6, the conclusions of the present study are provided
RL research belongs to the category of Markov decision process (MDP) [28], which attempts to solve the problem of decision optimization and can be defined as M = (S, A, P, γ, R), where S is the set of environment states; A is all possible actions the agent can select from the environment; R is the set of obtained rewards from the environment; P is the transition probabilities function; γ is the discount factor that determines the contribution of future rewards. st, at, pt, and rt respectively represent the current state, selected action, transition probability, and reward obtained from the environment
The problem is formulated as an MDP problem, and the aircraft is controlled by selecting the heading, changing the vertical velocity, and altering the horizontal velocity

Summary

Introduction

Aircraft guidance [1,2,3,4], especially high-dimensional aircraft guidance, has gradually emerged as a significant research focus in academic circles, owing to the application prospects in complex flight tasks and under realistic conditions. To solve the problem of aircraft guidance, a new reward function was proposed in [21], to improve the performance of the generated trajectories and the training efficiency. A multi-layer RL approach with a reward shaping algorithm is proposed for the multi-dimensional goal aircraft guidance flight task, wherein an aircraft is guided to waypoints at certain latitude, longitude, altitude, heading angle, and arrival time. A trained agent is adopted to control the aircraft by selecting the heading, changing the vertical velocity, and altering the horizontal velocity, based on an improved multi-layer RL algorithm with a shaped reward function. A multi-layer RL model and an intelligent aircraft guidance approach are presented to perform the multi-dimensional goal aircraft guidance flight task, by reducing the state space dimensions and simplifying the neural network structure. The remainder of the present study is organized as follows: in Section 2, the background concepts on Dubins path and RL are introduced, along with the variants used in the present work; in Section 3, the RL formulation of the aircraft guidance task is presented; in Section 4, the environment settings and structure of model are introduced in detail; in Section 5, numerical simulation results and discussion are given; and, in Section 6, the conclusions of the present study are provided

Dubins Path

Basics of Reinforcement Learning

Policy-Based RL

RL Formulation

Fly to Waypoints

Multi-Layer RL Algorithm

State Space

Action Space

Termination State

Reward Function Design

Experiment Setup

Models and Training

Models

Training

Analysis of Results

Without Considering Arrival Time

Considering Arrival Time

Multi Aircraft Performance

Findings

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sensors	Publication Date: Aug 21, 2021
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Multi-Dimensional Goal Aircraft Guidance Approach Based on Reinforcement Learning with a Reward Shaping Algorithm.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors

Lead the way for us

Similar Papers

Automated Antenna Design via Domain Knowledge-Informed Reinforcement Learning and Imitation Learning
Zhaohui Wei ... Yingzeng Yin
IEEE Transactions on Antennas and Propagation | VOL. 71
Zhaohui Wei, et. al.Zhaohui Wei ... Yingzeng Yin
01 Jul 2023
IEEE Transactions on Antennas and Propagation | VOL. 71

Two hybrid RSS/TOA localization techniques in cognitive radio system
Tawan Panichcharoenrat ... Wilaiporn Lee
-
Tawan Panichcharoenrat, et. al.Tawan Panichcharoenrat ... Wilaiporn Lee
01 Jan 2014
01 Jan 2014

Sub-Nyquist Rate UWB Indoor Positioning Using Power Delay Profile and Time of Arrival Estimates
Bilal Maqsood ... Ijaz Haider Naqvi
-
Bilal Maqsood, et. al.Bilal Maqsood ... Ijaz Haider Naqvi
01 Sep 2017
01 Sep 2017

Real-time routing algorithm for mobile ad hoc networks using reinforcement learning and heuristic algorithms
Ali Ghaffari
Wireless Networks | VOL. 23
Ali GhaffariAli Ghaffari
08 Jan 2016
Wireless Networks | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Multi-Dimensional Goal Aircraft Guidance Approach Based on Reinforcement Learning with a Reward Shaping Algorithm.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors