Tuning heuristics and convergence analysis of reinforcement learning algorithm for online data-based optimal control design

Fábio Nogueira Da Silva,João Viana Fonseca Neto

doi:10.33448/rsd-v9i2.2128

Fábio Nogueira Da Silva, João Viana Fonseca Neto

Open Access

https://doi.org/10.33448/rsd-v9i2.2128

Copy DOI

Abstract

A heuristic for tuning and convergence analysis of the reinforcement learning algorithm for control with output feedback with only input / output data generated by a model is presented. To promote convergence analysis, it is necessary to perform the parameter adjustment in the algorithms used for data generation, and iteratively solve the control problem. A heuristic is proposed to adjust the data generator parameters creating surfaces to assist in the convergence and robustness analysis process of the optimal online control methodology. The algorithm tested is the discrete linear quadratic regulator (DLQR) with output feedback, based on reinforcement learning algorithms through temporal difference learning in the policy iteration scheme to determine the optimal policy using input / output data only. In the policy iteration algorithm, recursive least squares (RLS) is used to estimate online parameters associated with output feedback DLQR. After applying the proposed tuning heuristics, the influence of the parameters could be clearly seen, and the convergence analysis facilitated.

Highlights

The system states gather information from system dynamic, either for control or monitoring purposes in various kinds of systems such as: industrial, aerospace, energy, health, economics
A heuristic for tuning and convergence analysis of the reinforcement learning algorithm for control with output feedback with only input / output data generated by a model is presented
The algorithm tested is the discrete linear quadratic regulator (DLQR) with output feedback, based on reinforcement learning algorithms through temporal difference learning in the policy iteration scheme to determine the optimal policy using input / output data only

Summary

INTRODUCTION

The system states gather information from system dynamic, either for control or monitoring purposes in various kinds of systems such as: industrial, aerospace, energy, health, economics. 2001) and (Nechyba & Xu, 1994) Each one of these methods have their own set of parameters to be tuning in order to achieve some desired performance. Application of system identification techniques into process control is largely used by researches, as well as state observers for system state estimation that could not be directly measured. Independent of the model-free, data driven, heuristic, algorithm or methodology used, there are parameters and initial conditions to be set that influences the performance, the training, the convergence speed in order to solve a problem from a chosen method. In this paper we propose a heuristic for tuning and convergence analysis of a modelfree optimal control problem based on the discrete linear quadratic regulator with output feedback using reinforcement learning algorithm in the presence of noise.

PRELIMINARIES

Value Function Formulation in Terms of Measured Data

Temporal Difference Error Based on Measured Data

Writing Policy Update in Terms of Measured Data

TUNING PROBLEM FORMULATION

PROPOSED METHODOLOGY

Convergence Analysis

SIMULATION AND CONVERGENCE ANALYSIS

Initial Setup and Computational Simulations

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Research, Society and Development	Publication Date: Jan 1, 2020
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Tuning heuristics and convergence analysis of reinforcement learning algorithm for online data-based optimal control design

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Research, Society and Development

Lead the way for us

Similar Papers

Convergence of the standard RLS method and UDUT factorisation of covariance matrix for solving the algebraic Riccati equation of the DLQR via heuristic approximate dynamic programming
Patrícia Helena Moraes Rêgo ... Ernesto M Ferreira
International Journal of Systems Science | VOL. 46
Patrícia Helena Moraes Rêgo, et. al.Patrícia Helena Moraes Rêgo ... Ernesto M Ferreira
17 Oct 2013
International Journal of Systems Science | VOL. 46

Computational Performance of State-Value Function Approximators Based on RLS-HDP Estimators for Online DLQR Control System Design
Ernesto F M Ferreira ... Patrícia Helena Moraes Rêgo
-
Ernesto F M Ferreira, et. al.Ernesto F M Ferreira ... Patrícia Helena Moraes Rêgo
01 Apr 2016
01 Apr 2016

Hierarchical Multiagent Formation Control Scheme via Actor-Critic Learning.
Chaoxu Mu ... Jiangwen Peng
IEEE Transactions on Neural Networks and Learning Systems | VOL. 34
Chaoxu Mu, et. al.Chaoxu Mu ... Jiangwen Peng
01 Nov 2023
IEEE Transactions on Neural Networks and Learning Systems | VOL. 34

Convergence Analysis using non-squares estimators to approximate the solution of HJB-Riccati equation for the design DLQR via HDP
Jonathan A Queiroz ... Allan Kardec Barros
-
Jonathan A Queiroz, et. al.Jonathan A Queiroz ... Allan Kardec Barros
01 Mar 2014
01 Mar 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Tuning heuristics and convergence analysis of reinforcement learning algorithm for online data-based optimal control design

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Research, Society and Development