Reinforcement Q-Learning Incorporated With Internal Model Method for Output Feedback Tracking Control of Unknown Linear Systems

Cong Chen,Yunjian Peng,Guangyue Zhao,Weijie Sun

doi:10.1109/access.2020.3011194

Cong Chen, Yunjian Peng + Show 2 more

Open Access

PDF Available

https://doi.org/10.1109/access.2020.3011194

Copy DOI

Export

Save

Cite

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 10	License type: CC BY 4.0

Affiliation: South China University of Technology

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

This paper investigates the output feedback (OPFB) tracking control problem for discrete-time linear (DTL) systems with unknown dynamics. With the approach of augmented system, the tracking control problem is first turned into a regulation problem with a discounted performance function, the solution of which relies on the Q-function based Bellman equation. Then, a novel value iteration (VI) scheme based on reinforcement Q-learning mechanism is proposed for solving the Q-function Bellman equation without knowing the system dynamics. Moreover, the convergence of the VI based Q-learning is proved by indicating that it converges to the Q-function Bellman equation and it brings out no bias of solution even under the probing noise satisfying the persistent excitation (PE) condition. As a result, the OPFB tracking controller can be learned online by using the past input, output, and reference trajectory data of the augmented system. The proposed scheme removes the requirement of initial admissible policy in the policy iteration (PI) method. Finally, effectiveness of the proposed scheme is demonstrated through a simulation example.

Highlights

For controller design problem, optimization of performance costs has been an important concern since it may lead to reduction in energy effort which leads to positive consequences on earth environment
SIMULATION RESULTS we propose a simulation example to verify the effectiveness of developed output feedback (OPFB) Q-learning algorithm based on value iteration (VI) scheme
Compared with the policy iteration (PI)-based Algorithm 1, it is verified that the VI-based Algorithm 2 removes the requirement of initial stabilizing control policy

Summary

Introduction

Optimization of performance costs has been an important concern since it may lead to reduction in energy effort which leads to positive consequences on earth environment. The solution of Ricatti equation can be efficiently obtained by the iteratively computational algorithms [2], [3], which are only applicable to the cases where complete knowledge of system dynamics is known. It is often desirable in control engineering to design online learning controllers without resorting to the system dynamics [4]–[8]. Notice that a data-based method has been proposed in [9] to analysis

Methods

Results

Conclusion