A new history experience replay design for model-free adaptive dynamic programming

Naresh Malla,Zhen Ni

doi:10.1016/j.neucom.2017.04.069

Abstract

An adaptive dynamic programming (ADP) controller is a powerful control technique that has been investigated, designed and tested in a wide range of applications for solving optimal control problems in complex systems. The performance of the ADP controller is usually obtained by long training periods because the data usage efficiency is low as it discards the samples once used. History experience, also known as experience replay, is a powerful technique showing potential to accelerate the training process of learning and control. However, the existing design of history experience cannot be directly used for the model-free ADP design, because the existing work focuses on the forward temporal difference (TD) information (e.g., state-action pair). This information is between the current time step and the future time step and will need a model network for future information prediction. This paper proposes a new history experience replay design to avoid the usage of the model network or identifier of the system/environment. Specifically, we designed the experience tuple with one step backward state-action information and the TD can be achieved by a previous time step and a current time step. In addition, a systematic approach is proposed to integrate history experience in both the critic and action networks of the ADP controller design. The proposed approach is tested for two case studies: a cart-pole balancing task and a triple-link pendulum balancing task. For fair comparison, we set the same initial starting states and initial weight parameters for both approaches under the same simulation environment. The statistical results show that the proposed approach can improve the required average number of trials to succeed as well as the success rate. In general, the proposed approach improved the required average trial to succeed by 26.5% for cart-pole and 43% for triple-link balancing tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A new history experience replay design for model-free adaptive dynamic programming

Abstract

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Journal: Neurocomputing	Publication Date: May 18, 2017
Citations: 16

Similar Papers

Prioritizing Useful Experience Replay for Heuristic Dynamic Programming-Based Learning Systems.
Zhen Ni ... Naresh Malla
IEEE transactions on cybernetics | VOL. 49
Zhen Ni, et. al.Zhen Ni ... Naresh Malla
30 Jul 2018
IEEE transactions on cybernetics | VOL. 49

Supplementary control for virtual synchronous machine based on adaptive dynamic programming
Naresh Malla ... Dipesh Shrestha
-
Naresh Malla, et. al.Naresh Malla ... Dipesh Shrestha
01 Jul 2016
01 Jul 2016

An Improved Adaptive Dynamic Programming Algorithm Based on Fuzzy Extended State Observer for Dissolved Oxygen Concentration Control
Xueliang Chen ... Xin Peng
Processes | VOL. 10
Xueliang Chen, et. al.Xueliang Chen ... Xin Peng
07 Dec 2022
Processes | VOL. 10

Numerical simulation of Richards equation in partially saturated porous media: under-relaxation and mass balance
Kok-Kwang Phoon ... Pui-Chih Chong
Geotechnical and Geological Engineering | VOL. 25
Kok-Kwang Phoon, et. al.Kok-Kwang Phoon ... Pui-Chih Chong
11 May 2007
Geotechnical and Geological Engineering | VOL. 25

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A new history experience replay design for model-free adaptive dynamic programming

Abstract

Talk to us

Similar Papers

More From: Neurocomputing