Reinforcement Learning Control for a 2-DOF Helicopter With State Constraints: Theory and Experiments

Zhijia Zhao,Tao Zou,Han-Xiong Li,Weitian He,Chaoxu Mu,Keum-Shik Hong

doi:10.1109/tase.2022.3215738

Abstract

This study focuses on the novel reinforcement learning control strategy of a nonlinear two-degrees-of-freedom (2-DOF) helicopter system for tracking the desired trajectory while minimizing the tracking error. First, gradient descent algorithm is incorporated in the context of the reinforcement learning control scheme to obtain the adaptive laws. Subsequently, considering the uncertainties in the nonlinear system, radial basis function (RBF) neural networks (NNs) are exploited to approximate the unknown internal dynamics. In contrast to the previous studies, aiming at accelerating the convergence in reinforcement learning control, a barrier Lyapunov function is constructed to constrain the states to ensure that the tracking error rapidly converges to a neighborhood of zero. Under the proposed control strategy, the states of the closed-loop system are proven to be semi-globally uniformly ultimately bounded through rigorous Lyapunov analyses, and the state constraints are satisfied. Furthermore, the simulations and experiments conducted on a Quanser laboratory platform reveal that the proposed control functions are suitable and effective. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Note to Practitioners</i> —This paper is motivated by designing a reinforcement learning control strategy to enhance online learning capability and control performance of the controller for a nonlinear 2-DOF helicopter system. The control framework is divided into the design of the critic and actor NNs, responsible primarily for evaluating the control performance and approximating uncertainties in the system separately. Unlike the adaptive NN control, the actor NN weights are updated by combining information of states and inputs from the critic NN. In addition, aiming at accelerating the convergence, a barrier Lyapunov function is constructed to constrain the states to ensure that the tracking error rapidly converges to a neighborhood of zero. Finally, the proposed control strategy is validated in simulation and experiment on the Quanser laboratory platform.

Full Text