Abstract
In this paper, a novel Virtual State-feedback Reference Feedback Tuning (VSFRT) and Approximate Iterative Value Iteration Reinforcement Learning (AI-VIRL) are applied for learning linear reference model output (LRMO) tracking control of observable systems with unknown dynamics. For the observable system, a new state representation in terms of input/output (IO) data is derived. Consequently, the Virtual State Feedback Tuning (VRFT)-based solution is redefined to accommodate virtual state feedback control, leading to an original stability-certified Virtual State-Feedback Reference Tuning (VSFRT) concept. Both VSFRT and AI-VIRL use neural networks controllers. We find that AI-VIRL is significantly more computationally demanding and more sensitive to the exploration settings, while leading to inferior LRMO tracking performance when compared to VSFRT. It is not helped either by transfer learning the VSFRT control as initialization for AI-VIRL. State dimensionality reduction using machine learning techniques such as principal component analysis and autoencoders does not improve on the best learned tracking performance however it trades off the learning complexity. Surprisingly, unlike AI-VIRL, the VSFRT control is one-shot (non-iterative) and learns stabilizing controllers even in poorly, open-loop explored environments, proving to be superior in learning LRMO tracking control. Validation on two nonlinear coupled multivariable complex systems serves as a comprehensive case study.
Highlights
Learning control from input/output (IO) system data is a current significant research area
One contribution of this work is to propose for the first time such a model-free framework called Virtual State-Feedback Reference Tuning (VSFRT), which learns control based on the feedback provided by the virtual state representation
Since the virtual state-feedback control design is attempted based on the VSFRT principle, it is known from classical Virtual State Feedback Tuning (VRFT) control that the NMP property of (1) requires special care, for simplification, it will be assumed that (1) is minimum-phase
Summary
Learning control from input/output (IO) system data is a current significant research area. The class of offline off-policy VIRL for unknown dynamical systems is adopted, based on neural networks (NNs) function approximators, it will be coined as Approximate Iterative VIRL (AI-VIRL) For this model-free offline off-policy learning variant, a database of transition samples (or experiences) is required to learn the optimal control. This is different from the virtual environments specific, e.g., to video games [14] (or even simulated mechatronics systems), where instability leads to an episode (or simulation) termination but physical damage is not a threat Another issue with reinforcement learning algorithms such as AI-VIRL, is the state representation. One contribution of this work is to propose for the first time such a model-free framework called Virtual State-Feedback Reference Tuning (VSFRT), which learns control based on the feedback provided by the virtual state representation.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have