Online Optimal Control of Robotic Systems with Single Critic NN-Based Reinforcement Learning

Xiaoyi Long,Zheng He,Zhongyuan Wang,Jing Na

doi:10.1155/2021/8839391

Abstract

This paper suggests an online solution for the optimal tracking control of robotic systems based on a single critic neural network (NN)-based reinforcement learning (RL) method. To this end, we rewrite the robotic system model as a state-space form, which will facilitate the realization of optimal tracking control synthesis. To maintain the tracking response, a steady-state control is designed, and then an adaptive optimal tracking control is used to ensure that the tracking error can achieve convergence in an optimal sense. To solve the obtained optimal control via the framework of adaptive dynamic programming (ADP), the command trajectory to be tracked and the modified tracking Hamilton-Jacobi-Bellman (HJB) are all formulated. An online RL algorithm is the developed to address the HJB equation using a critic NN with online learning algorithm. Simulation results are given to verify the effectiveness of the proposed method.

Highlights

In the control field and practical applications, reinforcement learning (RL) [1, 2] and adaptive dynamic programming (ADP) [3, 4] play a critical role to address the optimal control problems. e purpose of optimal control is to design a stabilizing control law by minimizing a predefined performance function
A lot of work focusing on the regulation problem for optimal control using the RL/ADP algorithms has been reported [5, 6] in the past years. e objective is to solve an optimal control that can maximize or minimize the system output energy and control actions, where the associated optimal control equations can be numerically solved via neural networks (NNs)
For completely unknown system dynamics, the results in [12] showed that a model-free policy iteration (PI) approach can be developed for CT linear systems, which can online calculate the optimal solutions using the input/output measurements. is principle was subsequently extended to nonlinear systems in [13, 14]

Summary

Introduction

According to the above facts, we propose a new RL algorithm to realize the optimal tracking control of robotic systems To this end, the system model is rewritten as a statespace form, which will contribute to the realization of optimal tracking control. For this purpose, a single critic NN is applied to estimate the solution of the HJB equation and update the optimal control action. I is the identity matrix, and 0 × 0 means the zero matrix. λmax and λmin are the maximal and minimum eigenvalues of a matrix, respectively. diag􏼈[a1, a2, a3, . . . , an]􏼉 is a diagonal matrix with component a1, . . . , an. (·)x z(·)/z(x) defines the partial differential operation

Preliminaries and Problem Statement

Online Dynamic Tracking Control

Simulation