Deterministic policy gradient adaptive dynamic programming for model-free optimal control

Yongwei Zhang,Bo Zhao,Derong Liu

doi:10.1016/j.neucom.2019.11.032

Abstract

In this paper, a deterministic policy gradient adaptive dynamic programming (DPGADP) algorithm is proposed for solving model-free optimal control problems of discrete-time nonlinear systems. By using the measured data, the developed algorithm improves the control performance with the policy gradient method. The convergence of DPGADP algorithm is demonstrated by showing that the constructed Q-function sequence is monotonically non-increasing and converges to the optimal Q-function. An actor-critic neural network (NN) structure is established to implement the DPGADP algorithm. Experience replay and target network techniques from deep Q-learning are employed during the training process. The stability of the NN weight error dynamics is also analyzed. Finally, two simulation examples are presented to verify the effectiveness of the proposed method.

Full Text