Reinforcement learning (RL) has been successfully employed as a powerful tool in designing adaptive optimal controllers. Recently, off-policy learning has emerged to design optimal controllers for systems with completely unknown dynamics. However, current approaches for optimal tracking control design either result in bounded tracking error, rather than zero tracking error, or require partial knowledge of the system dynamics. Moreover, they usually require to collect a large set of data to learn the optimal solution. To obviate these limitations, this paper applies a combination of off-policy learning and experience-replay for output regulation tracking control of continuous-time linear systems with completely unknown dynamics. To this end, the off-policy integral RL-based technique is first used to obtain the optimal control feedback gain, and to explicitly identify the involved system dynamics using the same data. Second, a data-efficient-based experience replay method is developed to compute the exosystem dynamics. Finally, the output regulator equations are solved using data measured online. It is shown that the proposed control method stabilizes the closed-loop tracking error dynamics, and gives an explicit exponential convergence rate for the output tracking error. Simulation results show the effectiveness of the proposed approach.
Read full abstract