Continuous Adaptive Critic Flight Control Aided with Approximated Plant Dynamics

Erik-Jan Van Kampen,J.A Mulder,Q.P Chu

doi:10.2514/6.2006-6429

Abstract

A relatively new approach to adaptive flight control is the use of reinforcement learning methods. Controllers that apply reinforcement learning methods learn by interaction with the environment and their ability to adapt themselves online makes them especially useful in adaptive and reconfigurable flight control systems. This paper is focused on a group of reinforcement learning methods, called Adaptive Critic Designs(ACD), that are characterized by their subdivision of tasks and components (actor/critic). One specific ACD that has previously been implemented in a helicopter flight control system is Action Dependent Heuristic Dynamic Programming(ADHDP). The exchange of information between the actor and the critic component in ADHDP controllers is by means of a direct connection of the actor output to the critic input, although the actor output is not a necessary input for the critic to accurately estimate the optimal value function. An alternative approach to this information exchange is implemented where the actor network is disconnected from the critic and the update of the actor network is realized by using a neural network that approximates the dynamics of the plant that is controlled. The approximated plant dynamics network is then updated online to adapt to changes in the plant dynamics. This alternative controller is called the action independent Heuristic Dynamic Programming (HDP) controller using approximated plant dynamics. The goal of this paper is to gain insight into the theoretical and practical differences between the ADHDP and the HDP controller, when applied in an online environment with changing plant dynamics. To investigate the practical differences the ADHDP and HDP controllers are implemented for a model of the General Dynamics F-16 and the characteristics of the controllers are investigated and compared to each other by conducting two types of experiments. First the controllers are trained offline to control the baseline F-16 model, next the dynamics of the F-16 model are changed online and the controllers will have to adapt to the new plant dynamics. The results from the offline experiments show that the HDP controller with the approximated plant dynamics has a higher success ratio for learning to control the baseline F-16 model. Both the baseline ADHDP and HDP controllers can already cope with some changes in the plant dynamics without adapting themselves. The online experiments further show that the HDP controller outperforms the ADHDP controller in adapting to changed plant dynamics. The HDP controller is more sensitive to measurement noise than the ADHDP controller, but can be used with a wider range of initial flight conditions.

Full Text