Editorial Special Issue on Adaptive Dynamic Programming and Reinforcement Learning

Frank L. Lewis,Derong Liu,Qinglai Wei

doi:10.1109/tsmc.2020.3025549

Abstract

The past decade has witnessed a surge in research activities related to adaptive dynamic programming (ADP) and reinforcement learning (RL), particularly for control applications. Several books [item 1)–5) in the Appendix] and survey papers [item 6)–10) in the Appendix] have been published on the subject. Both ADP and RL provide approximate solutions to dynamic programming problems. In a 1995 article by Barto <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">et al.</i> [item 11) in the Appendix], they introduced the so-called “adaptive real-time dynamic programming,” which was specifically to apply ADP for real-time control. Later, in 2002, Murray <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">et al.</i> [item 12) in the Appendix] developed an ADP algorithm for optimal control of continuous-time affine nonlinear systems. On the other hand, the most famous algorithms in RL are the temporal difference algorithm [item 13) in the Appendix] and the Q-learning algorithm [item 14) and 15) in the Appendix].

Full Text