An identifier-actor-optimizer policy learning architecture for optimal control of continuous-time nonlinear systems

Lin Cheng,Fanghua Jiang,Junfeng Li,Zhenbo Wang

doi:10.1007/s11433-019-1481-2

Abstract

An intelligent solution method is proposed to achieve real-time optimal control for continuous-time nonlinear systems using a novel identifier-actor-optimizer (IAO) policy learning architecture. In this IAO-based policy learning approach, a dynamical identifier is developed to approximate the unknown part of system dynamics using deep neural networks (DNNs). Then, an indirect-method-based optimizer is proposed to generate high-quality optimal actions for system control considering both the constraints and performance index. Furthermore, a DNN-based actor is developed to approximate the obtained optimal actions and return good initial guesses to the optimizer. In this way, the traditional optimal control methods and state-of-the-art DNN techniques are combined in the IAO-based optimal policy learning method. Compared to the reinforcement learning algorithms with actor-critic architectures that suffer hard reward design and low computational efficiency, the IAO-based optimal policy learning algorithm enjoys fewer user-defined parameters, higher learning speeds, and steadier convergence properties in solving complex continuous-time optimal control problems (OCPs). Simulation results of three space flight control missions are given to substantiate the effectiveness of this IAO-based policy learning strategy and to illustrate the performance of the developed DNN-based optimal control method for continuous-time OCPs.

Full Text