Abstract

In this paper, a deterministic policy gradient adaptive dynamic programming (DPGADP) algorithm is proposed for solving model-free optimal control problems of discrete-time nonlinear systems. By using the measured data, the developed algorithm improves the control performance with the policy gradient method. The convergence of DPGADP algorithm is demonstrated by showing that the constructed Q-function sequence is monotonically non-increasing and converges to the optimal Q-function. An actor-critic neural network (NN) structure is established to implement the DPGADP algorithm. Experience replay and target network techniques from deep Q-learning are employed during the training process. The stability of the NN weight error dynamics is also analyzed. Finally, two simulation examples are presented to verify the effectiveness of the proposed method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call