In the operation of practical systems, it is often a challenging task to deal with limited knowledge of the system dynamics, therefore, it is needed to obtain a framework for the simultaneous learning and control of dynamic systems, able to cope with uncertain dynamics. This paper proposes a model-free aperiodic tracking control method based on MAXQ hierarchical reinforcement learning for unknown dynamics in discrete-time systems. Firstly, a MAXQ hierarchical framework is developed for aperiodic tracking control that decomposes the task into pre-control and accurate control subtasks. The former achieves sub-optimal tracking which prompts the implementation of accurate control. The latter implements aperiodic tracking based on the pre-control result to reduce resource occupation. The structure of the MAXQ framework simplifies the complexity of the tracking task, and the incorporation of pre-control accelerates the learning process in the formal control stage. Secondly, considering the disparities between the control inputs and the control update instants, pre-control is decomposed to learn the triggering strategy and control policy, respectively. Meanwhile, the control policy learns optimal control inputs by minimizing the cost function. Similarly, the triggering strategy and control policy are also separately learned in accurate control. In contrast to traditional aperiodic triggering mechanisms, in accurate control stage, the triggering strategy is developed based on the cumulative error of the pre-control result, thus avoiding the influence of accidental factors and the cumulative effects of errors, enhancing system robustness. Thirdly, the proposed tracking control is derived by using only the input, output, and reference signal data from the system, without relying on system dynamics. It is applicable to both linear and nonlinear systems, demonstrating strong generalizability. Finally, simulation examples are provided to validate the effectiveness and superiority of the proposed method.
Read full abstract