Abstract

SummaryConventional closed‐form solution to the optimal control problem using optimal control theory is only available under the assumption that there are known system dynamics/models described as differential equations. Without such models, reinforcement learning (RL) as a candidate technique has been successfully applied to iteratively solve the optimal control problem for unknown or varying systems. For the optimal tracking control problem, existing RL techniques in the literature assume either the use of a predetermined feedforward input for the tracking control, restrictive assumptions on the reference model dynamics, or discounted tracking costs. Furthermore, by using discounted tracking costs, zero steady‐state error cannot be guaranteed by the existing RL methods. This article therefore presents an optimal online RL tracking control framework for discrete‐time (DT) systems, which does not impose any restrictive assumptions of the existing methods and equally guarantees zero steady‐state tracking error. This is achieved by augmenting the original system dynamics with the integral of the error between the reference inputs and the tracked outputs for use in the online RL framework. It is further shown that the resulting value function for the DT linear quadratic tracker using the augmented formulation with integral control is also quadratic. This enables the development of Bellman equations, which use only the system measurements to solve the corresponding DT algebraic Riccati equation and obtain the optimal tracking control inputs online. Two RL strategies are thereafter proposed based on both the value function approximation and the Q‐learning along with bounds on excitation for the convergence of the parameter estimates. Simulation case studies show the effectiveness of the proposed approach.

Highlights

  • Reinforcement learning (RL) is a type of machine learning technique that has been used extensively in the area of computing and artificial intelligence to solve complex optimization problems.[1,2] Due to its successes, there have been concerted efforts by researchers in the control community to explore the overlap between reinforcement learning (RL) and optimal control theory, which usually involves solving the general-purpose Hamilton-Jacobi Bellman (HJB) equations

  • In contrast to the approaches discussed in Reference 30,31,34-36, the proposed framework removes the need to have either a predetermined feedforward control input, any restrictive assumptions on the reference model dynamics, or a discounted tracking cost that limits the practical applications of existing online tracking RL approaches

  • A new augmented formulation for the online optimal tracking control problem that guarantees zero steady-state tracking error without imposing any restrictive assumptions on the reference dynamics or discounted performance cost is proposed to overcome the limitations of the existing strategies

Read more

Summary

INTRODUCTION

Reinforcement learning (RL) is a type of machine learning technique that has been used extensively in the area of computing and artificial intelligence to solve complex optimization problems.[1,2] Due to its successes, there have been concerted efforts by researchers in the control community to explore the overlap between RL and optimal control theory, which usually involves solving the general-purpose Hamilton-Jacobi Bellman (HJB) equations. This approach learns a model of the system dynamics online and generally requires pretrained models while assuming fixed model structures for the identification In contrast to these approaches, strategies that employ the augmented formulation obviate the need to have a predetermined feedforward control input by transforming the tracking control into a regulation problem using augmented system states. In contrast to the approaches discussed in Reference 30,31,34-36, the proposed framework removes the need to have either a predetermined feedforward control input, any restrictive assumptions on the reference model dynamics, or a discounted tracking cost that limits the practical applications of existing online tracking RL approaches.

PROBLEM FORMULATION
AUGMENTED FORMULATION FOR THE OPTIMAL TRACKING PROBLEM WITH INTEGRAL CONTROL
MODEL-BASED SOLUTION TO THE AUGMENTED LQT FORMULATION WITH INTEGRAL CONTROL
VFA-based RL algorithm
Q-function-based RL algorithm
6: Update the policy parameters using a greedy optimization as:
SIMULATION CASE STUDIES
Case study 1
VFA-based RL adaptation
QFA-based RL adaptation
Case study 2
CONCLUSIONS
Methods
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call