Abstract
This paper investigates the output feedback (OPFB) tracking control problem for discrete-time linear (DTL) systems with unknown dynamics. With the approach of augmented system, the tracking control problem is first turned into a regulation problem with a discounted performance function, the solution of which relies on the Q-function based Bellman equation. Then, a novel value iteration (VI) scheme based on reinforcement Q-learning mechanism is proposed for solving the Q-function Bellman equation without knowing the system dynamics. Moreover, the convergence of the VI based Q-learning is proved by indicating that it converges to the Q-function Bellman equation and it brings out no bias of solution even under the probing noise satisfying the persistent excitation (PE) condition. As a result, the OPFB tracking controller can be learned online by using the past input, output, and reference trajectory data of the augmented system. The proposed scheme removes the requirement of initial admissible policy in the policy iteration (PI) method. Finally, effectiveness of the proposed scheme is demonstrated through a simulation example.
Highlights
For controller design problem, optimization of performance costs has been an important concern since it may lead to reduction in energy effort which leads to positive consequences on earth environment
SIMULATION RESULTS we propose a simulation example to verify the effectiveness of developed output feedback (OPFB) Q-learning algorithm based on value iteration (VI) scheme
Compared with the policy iteration (PI)-based Algorithm 1, it is verified that the VI-based Algorithm 2 removes the requirement of initial stabilizing control policy
Summary
Optimization of performance costs has been an important concern since it may lead to reduction in energy effort which leads to positive consequences on earth environment. The solution of Ricatti equation can be efficiently obtained by the iteratively computational algorithms [2], [3], which are only applicable to the cases where complete knowledge of system dynamics is known. It is often desirable in control engineering to design online learning controllers without resorting to the system dynamics [4]–[8]. Notice that a data-based method has been proposed in [9] to analysis
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have