This paper investigates the robust optimal control problem of a class of continuous-time, partially linear, interconnected systems. In addition to the dynamic uncertainties resulted from the interconnected dynamic system, unknown bounded disturbances are taken into account throughout the learning process, wherein the system’s dynamics and the disturbances are assumed unknown. These challenges lead the collected online data to be imperfect. In this scenario, traditional data-driven control techniques, such as adaptive dynamic programming (ADP) and robust ADP, encounter a challenge in approximating the optimal control policy precisely due to imperfect data and computational errors. In this paper, a novel data-driven robust policy iteration method is proposed to simultaneously solve the robust optimal control problems. Without relying on the knowledge of the system’s dynamics, the external disturbances or the complete state, the implementation of the proposed method only needs to access the input and partial state information. Based on the small-gain theorem, the notions of strong unboundedness observability and input-to-output stability, it is guaranteed that the learned robust optimal control gain is stabilizing and that the solution of the closed-loop system is uniformly ultimately bounded despite the existence of dynamic uncertainties and unknown external disturbances. The simulation results reveal the efficiency and practicality of the proposed data-driven control method. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Note to Practitioners</i> —This work is motivated by the use of reinforcement learning to improve the quality of designing adaptive optimal controllers for engineering applications. Adaptive dynamic programming methods, in particular policy iteration (PI), are widely used in solving optimal control problems. However, due to the iterative nature of PI, the approximated optimal control policy may be inaccurate and imprecise. Especially, when using imperfect system’s measurements instead of the modelling information. This can result in causing the learned control policy to deviate from the actual optimal policy. This becomes more challenging in the existence of dynamic uncertainties and unknown external disturbances which corrupt the measurements and result in imperfect data. This work investigates the conditions on the uncertainties such that the proposed novel data-driven PI algorithm is robust to system’s uncertainties, unknown external disturbances and imperfect measurements. The approximated robust optimal control policy performs robustly in the existences of imperfect data and uncertainties, and at the same time is close enough to the optimal control policy.
Read full abstract