Virtual State Feedback Reference Tuning and Value Iteration Reinforcement Learning for Unknown Observable Systems Control

Mircea-Bogdan Radac,Anamaria-Ioana Borlea

doi:10.3390/en14041006

Mircea-Bogdan Radac, Anamaria-Ioana Borlea

Open Access

https://doi.org/10.3390/en14041006

Copy DOI

Journal: Energies	Publication Date: Feb 15, 2021
Citations: 20	License type: CC BY 4.0

Affiliation: Polytechnic University of Timişoara

Abstract

In this paper, a novel Virtual State-feedback Reference Feedback Tuning (VSFRT) and Approximate Iterative Value Iteration Reinforcement Learning (AI-VIRL) are applied for learning linear reference model output (LRMO) tracking control of observable systems with unknown dynamics. For the observable system, a new state representation in terms of input/output (IO) data is derived. Consequently, the Virtual State Feedback Tuning (VRFT)-based solution is redefined to accommodate virtual state feedback control, leading to an original stability-certified Virtual State-Feedback Reference Tuning (VSFRT) concept. Both VSFRT and AI-VIRL use neural networks controllers. We find that AI-VIRL is significantly more computationally demanding and more sensitive to the exploration settings, while leading to inferior LRMO tracking performance when compared to VSFRT. It is not helped either by transfer learning the VSFRT control as initialization for AI-VIRL. State dimensionality reduction using machine learning techniques such as principal component analysis and autoencoders does not improve on the best learned tracking performance however it trades off the learning complexity. Surprisingly, unlike AI-VIRL, the VSFRT control is one-shot (non-iterative) and learns stabilizing controllers even in poorly, open-loop explored environments, proving to be superior in learning LRMO tracking control. Validation on two nonlinear coupled multivariable complex systems serves as a comprehensive case study.

Highlights

Learning control from input/output (IO) system data is a current significant research area
One contribution of this work is to propose for the first time such a model-free framework called Virtual State-Feedback Reference Tuning (VSFRT), which learns control based on the feedback provided by the virtual state representation
Since the virtual state-feedback control design is attempted based on the VSFRT principle, it is known from classical Virtual State Feedback Tuning (VRFT) control that the NMP property of (1) requires special care, for simplification, it will be assumed that (1) is minimum-phase

Summary

Introduction

Learning control from input/output (IO) system data is a current significant research area. The class of offline off-policy VIRL for unknown dynamical systems is adopted, based on neural networks (NNs) function approximators, it will be coined as Approximate Iterative VIRL (AI-VIRL) For this model-free offline off-policy learning variant, a database of transition samples (or experiences) is required to learn the optimal control. This is different from the virtual environments specific, e.g., to video games [14] (or even simulated mechatronics systems), where instability leads to an episode (or simulation) termination but physical damage is not a threat Another issue with reinforcement learning algorithms such as AI-VIRL, is the state representation. One contribution of this work is to propose for the first time such a model-free framework called Virtual State-Feedback Reference Tuning (VSFRT), which learns control based on the feedback provided by the virtual state representation.

The LRMO Tracking Problem

Recapitulating VRFT for Error-Feedback IO Control

A VSFRT controller rendering VVR

There exists a set of nonlinear parameterized state-feedback continuously dif

The AI-VIRL Solution for the LRM Output Tracking

The Neural Transfer Learning Capacity

First Validation Case Study

IO Data Collected in Closed-Loop

Learning

Learning the AI-VIRL

AI-VIRL

Open-loop

VSFRT and AI-VIRL tracking when learningthe uses open-loop space

Testing the Transfer Learning Advantage

Second Validation Case Study

IO Data Collection in Closed-Loop

Learning thecollected

Each learning trains for MaxStepson

Findings

12. Open-loop

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Virtual State Feedback Reference Tuning and Value Iteration Reinforcement Learning for Unknown Observable Systems Control

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Energies

Lead the way for us

Similar Papers

Tuning Nonlinear Controllers with the Virtual Reference Approach
Alexandre Sanfelice Bazanella ... Tassiano Neuhaus
IFAC Proceedings Volumes | VOL. 47
Alexandre Sanfelice Bazanella, et. al.Alexandre Sanfelice Bazanella ... Tassiano Neuhaus
01 Jan 2014
IFAC Proceedings Volumes | VOL. 47

Three-level hierarchical model-free learning approach to trajectory tracking control
Mircea-Bogdan Radac ... Radu-Emil Precup
Engineering Applications of Artificial Intelligence | VOL. 55
Mircea-Bogdan Radac, et. al.Mircea-Bogdan Radac ... Radu-Emil Precup
05 Jul 2016
Engineering Applications of Artificial Intelligence | VOL. 55

Learning‐based robust control methodologies under information constraints
Hamid Reza Karimi ... Ning Wang
International journal of robust and nonlinear control | VOL. 32
Hamid Reza Karimi, et. al.Hamid Reza Karimi ... Ning Wang
26 Jan 2022
International journal of robust and nonlinear control | VOL. 32

An efficient neural network based tracking controller for autonomous underwater vehicles subject to unknown dynamics
Chang-Zhong Pan ... Simon X Yang
-
Chang-Zhong Pan, et. al.Chang-Zhong Pan ... Simon X Yang
01 May 2014
01 May 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Virtual State Feedback Reference Tuning and Value Iteration Reinforcement Learning for Unknown Observable Systems Control

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Energies