Abstract

A data-driven approximation formulation for the state reconstruction problem of dynamical systems is presented in this paper. Without the assumption of an explicit mathematical model, the Hamilton–Jacobi–Bellman (HJB) based approach of a data-driven state reconstruction design method for monitoring and output-feedback control of dynamical systems is presented. The proposed state reconstruction design is based on a dynamic programming approach. To evaluate the proposed state reconstruction, computational experiments are conducted using only dynamical system model output data. The sensitivity of the algorithm parameters is also analyzed and discussed. The performance evaluation is analyzed in terms of the error metrics of the discrete linear quadratic regulator with output feedback under the value iteration algorithm through a reinforcement learning strategy.

Highlights

  • Owing to a difficult access when installing sensors used for measuring states and/or state variables that do not have physical representations, relevant states are unable to be measured for a given application when monitoring and/or controlling real-world systems

  • These devices present a solution to the problem, allowing the development of devices consisting of sensors, micro-controllers, and embedded algorithms to synthesize state space observers based on adaptive dynamic programming (ADP) approaches [12]–[14]

  • In this paper, a state reconstruction method of a dynamical system based on dynamic programming and reinforcement learning approaches driven by measured data is presented

Read more

Summary

INTRODUCTION

Owing to a difficult access when installing sensors used for measuring states and/or state variables that do not have physical representations, relevant states are unable to be measured for a given application when monitoring and/or controlling real-world systems. Owing to the difficulty in measuring all states for complete feedback, state observer devices enable the application of optimal control methodologies These devices present a solution to the problem, allowing the development of devices consisting of sensors, micro-controllers, and embedded algorithms to synthesize state space observers based on adaptive dynamic programming (ADP) approaches [12]–[14]. Matrix Pstill depends on matrices A, B, and C by M0, Mu, and My. 2) TEMPORAL DIFFERENCE ERROR BASED ON MEASURED DATA The reinforcement learning algorithm based on temporal differences to determine the value function online can be defined using the Bellman temporal difference error equation for DLQR with respect to the states as follows: ek = −xkT Pxk + yTk Qyk + uTk Ruk + xkT+1Pxk+1. The advantage here is that no dynamical system model is needed to estimate the control actions, and only the measured data and tuning procedures, such as the forgetting factor, discount factor, covariance matrix RLS, and weighting matrices Q and R, are required

MATRIX APPROXIMATION AND STATE
SIMULATION AND ANALYSIS
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.