Abstract
In this paper, an improved off-policy reinforcement learning (RL) algorithm with neural networks(NN) observer is proposed to solve the linear quadratic tracking (LQT) problem for continuous-time (CT) systems without any knowledge of the system dynamics. The offline algorithm solves a Lyapunov equation to find a optimal solution which requires complete knowledge of the system dynamics. Later the off-policy RL algorithm was used to solve the state-feedback control which does not require any knowledge of the system dynamics by using the same input and state information repeatedly in previous research. The proposed output-feedback (OPFB) control algorithm solves Bellman equation which demands the system state information by using an adaptive NN state observer to estimate the system state with the input and output information of CT systems. Simulation results provide the efficiency of the proposed approach. Key Words: Off-policy, Reinforcement Learning (RL), Linear Quadratic Tracking (LQT), Output-feedback (OPFB)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have