Abstract

Guiding an aircraft to 4D waypoints at a certain heading is a multi-dimensional goal aircraft guidance problem. In order to improve the performance and solve this problem, this paper proposes a multi-layer RL approach. The approach enables the autopilot in an ATC simulator to guide an aircraft to 4D waypoints at certain latitude, longitude, altitude, heading, and arrival time, respectively. To be specific, a multi-layer RL approach is proposed to simplify the neural network structure and reduce the state dimensions. A shaped reward function that involves the potential function and Dubins path method is applied. Experimental and simulation results show that the proposed approach can significantly improve the convergence efficiency and trajectory performance. Furthermore, the results indicate possible application prospects in team aircraft guidance tasks, since the aircraft can directly approach a goal without waiting in a specific pattern, thereby overcoming the problem of current ATC simulators.

Highlights

  • The remainder of the present study is organized as follows: in Section 2, the background concepts on Dubins path and reinforcement learning (RL) are introduced, along with the variants used in the present work; in Section 3, the RL formulation of the aircraft guidance task is presented; in Section 4, the environment settings and structure of model are introduced in detail; in Section 5, numerical simulation results and discussion are given; and, in Section 6, the conclusions of the present study are provided

  • RL research belongs to the category of Markov decision process (MDP) [28], which attempts to solve the problem of decision optimization and can be defined as M = (S, A, P, γ, R), where S is the set of environment states; A is all possible actions the agent can select from the environment; R is the set of obtained rewards from the environment; P is the transition probabilities function; γ is the discount factor that determines the contribution of future rewards. st, at, pt, and rt respectively represent the current state, selected action, transition probability, and reward obtained from the environment

  • The problem is formulated as an MDP problem, and the aircraft is controlled by selecting the heading, changing the vertical velocity, and altering the horizontal velocity

Read more

Summary

Introduction

Aircraft guidance [1,2,3,4], especially high-dimensional aircraft guidance, has gradually emerged as a significant research focus in academic circles, owing to the application prospects in complex flight tasks and under realistic conditions. To solve the problem of aircraft guidance, a new reward function was proposed in [21], to improve the performance of the generated trajectories and the training efficiency. A multi-layer RL approach with a reward shaping algorithm is proposed for the multi-dimensional goal aircraft guidance flight task, wherein an aircraft is guided to waypoints at certain latitude, longitude, altitude, heading angle, and arrival time. A trained agent is adopted to control the aircraft by selecting the heading, changing the vertical velocity, and altering the horizontal velocity, based on an improved multi-layer RL algorithm with a shaped reward function. A multi-layer RL model and an intelligent aircraft guidance approach are presented to perform the multi-dimensional goal aircraft guidance flight task, by reducing the state space dimensions and simplifying the neural network structure. The remainder of the present study is organized as follows: in Section 2, the background concepts on Dubins path and RL are introduced, along with the variants used in the present work; in Section 3, the RL formulation of the aircraft guidance task is presented; in Section 4, the environment settings and structure of model are introduced in detail; in Section 5, numerical simulation results and discussion are given; and, in Section 6, the conclusions of the present study are provided

Dubins Path
Basics of Reinforcement Learning
Policy-Based RL
RL Formulation
Fly to Waypoints
Multi-Layer RL Algorithm
State Space
Action Space
Termination State
Reward Function Design
Experiment Setup
Models and Training
Models
Training
Analysis of Results
Without Considering Arrival Time
Considering Arrival Time
Multi Aircraft Performance
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.