Policy Gradient Adaptive Critic Designs for Model-Free Optimal Tracking Control With Experience Replay

Mingduo Lin,Derong Liu,Bo Zhao

doi:10.1109/tsmc.2021.3071968

Abstract

A model-free optimal tracking controller is designed for discrete-time nonlinear systems through policy gradient adaptive critic designs (PGACDs) with experience replay (ER). By using system transformation, optimal tracking control problems are converted into optimal regulation problems. An off-policy PGACD algorithm is developed to minimize the iterative <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula> -function and improve the tracking control performance. The proposed method is realized based on the critic network and the actor network (AN), which are applied to approximate the iterative <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula> -function and the iterative control policy, respectively. Then, the policy gradient technique is introduced to derive a novel weight updating law of the AN explicitly by using measured system data only. The convergence of the iteration is established through theoretical analysis, and the uniform ultimate boundedness is demonstrated for the closed-loop system under the PGACD-based controller by using Lyapunov’s direct method. To guarantee the stability and increase the data usage efficiency of the learning process, an ER-based learning framework is designed to improve the realizability of the proposed method. Finally, simulation results of two examples are provided to demonstrate the performance of the off-policy PGACD algorithm.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Policy Gradient Adaptive Critic Designs for Model-Free Optimal Tracking Control With Experience Replay

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans : a publication of the IEEE Systems, Man, and Cybernetics Society

Lead the way for us

Journal: IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans : a publication of the IEEE Systems, Man, and Cybernetics Society	Publication Date: Apr 19, 2021
Citations: 36

Similar Papers

Generalized policy iteration adaptive dynamic programming algorithm for optimal tracking control of a class of nonlinear systems
Qiao Lin ... Derong Liu
-
Qiao Lin, et. al.Qiao Lin ... Derong Liu
01 May 2016
01 May 2016

Data-Based Iterative DHP Optimal Tracking Control with a Wastewater Treatment Application
Huiling Zhao ... Ding Wang
-
Huiling Zhao, et. al.Huiling Zhao ... Ding Wang
26 Jul 2021
26 Jul 2021

Optimal tracking control for reconfigurable manipulators based on critic-only policy iteration algorithm
Hongbing Xia ... Yuanchun Li
-
Hongbing Xia, et. al.Hongbing Xia ... Yuanchun Li
01 Jul 2017
01 Jul 2017

A novel optimal tracking control scheme for a class of discrete-time nonlinear systems using generalised policy iteration adaptive dynamic programming algorithm
Qiao Lin ... Derong Liu
International journal of systems science | VOL. 48
Qiao Lin, et. al.Qiao Lin ... Derong Liu
24 May 2016
International journal of systems science | VOL. 48

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Policy Gradient Adaptive Critic Designs for Model-Free Optimal Tracking Control With Experience Replay

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans : a publication of the IEEE Systems, Man, and Cybernetics Society