Undiscounted reinforcement learning for infinite-time optimal output tracking and disturbance rejection of discrete-time LTI systems with unknown dynamics

Ali Amirparast,S Kamal Hosseini Sani

doi:10.1080/00207721.2023.2221240

Ali Amirparast, S Kamal Hosseini Sani

https://doi.org/10.1080/00207721.2023.2221240

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

This paper proposes a novel control structure to solve the infinite-time linear quadratic tracking (LQT) problem. The major challenge in the LQT problem is the boundedness issue of the cost function in an infinite time framework. In many studies, a discount factor is utilised to overcome the challenge. However, it can affect the stability of the closed-loop system and the steady-state error. This paper proposes an optimal control structure that guarantees zero steady-state error with bounded cost function without utilising the discount factor. The optimal gains of the proposed control structure are computed via model-based and model-free reinforcement learning (RL) algorithms. As a novelty in model-based RL algorithms, a model predictive RL algorithm is proposed to reduce the number of iterations in the learning phase. A model-free reinforcement learning algorithm is utilised to obtain optimal control for tracking the reference online and without any knowledge of system dynamics. Finally, the simulation results verify the advantages of the proposed optimal control structure.

Full Text