Abstract

A heuristic for tuning and convergence analysis of the reinforcement learning algorithm for control with output feedback with only input / output data generated by a model is presented. To promote convergence analysis, it is necessary to perform the parameter adjustment in the algorithms used for data generation, and iteratively solve the control problem. A heuristic is proposed to adjust the data generator parameters creating surfaces to assist in the convergence and robustness analysis process of the optimal online control methodology. The algorithm tested is the discrete linear quadratic regulator (DLQR) with output feedback, based on reinforcement learning algorithms through temporal difference learning in the policy iteration scheme to determine the optimal policy using input / output data only. In the policy iteration algorithm, recursive least squares (RLS) is used to estimate online parameters associated with output feedback DLQR. After applying the proposed tuning heuristics, the influence of the parameters could be clearly seen, and the convergence analysis facilitated.

Highlights

  • The system states gather information from system dynamic, either for control or monitoring purposes in various kinds of systems such as: industrial, aerospace, energy, health, economics

  • A heuristic for tuning and convergence analysis of the reinforcement learning algorithm for control with output feedback with only input / output data generated by a model is presented

  • The algorithm tested is the discrete linear quadratic regulator (DLQR) with output feedback, based on reinforcement learning algorithms through temporal difference learning in the policy iteration scheme to determine the optimal policy using input / output data only

Read more

Summary

INTRODUCTION

The system states gather information from system dynamic, either for control or monitoring purposes in various kinds of systems such as: industrial, aerospace, energy, health, economics. 2001) and (Nechyba & Xu, 1994) Each one of these methods have their own set of parameters to be tuning in order to achieve some desired performance. Application of system identification techniques into process control is largely used by researches, as well as state observers for system state estimation that could not be directly measured. Independent of the model-free, data driven, heuristic, algorithm or methodology used, there are parameters and initial conditions to be set that influences the performance, the training, the convergence speed in order to solve a problem from a chosen method. In this paper we propose a heuristic for tuning and convergence analysis of a modelfree optimal control problem based on the discrete linear quadratic regulator with output feedback using reinforcement learning algorithm in the presence of noise.

PRELIMINARIES
Value Function Formulation in Terms of Measured Data
Temporal Difference Error Based on Measured Data
Writing Policy Update in Terms of Measured Data
TUNING PROBLEM FORMULATION
PROPOSED METHODOLOGY
Convergence Analysis
SIMULATION AND CONVERGENCE ANALYSIS
Initial Setup and Computational Simulations

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.