Reinforcement Learning and Dynamic Programming

Andrew G Barto

doi:10.1016/s1474-6670(17)45266-9

Abstract

Reinforcement learning refers to a class of learning tasks and algorithms based on experimented psychology’s principle of reinforcement. Recent research uses the framework of stochastic optimal control to model problems in which a learning agent has to incrementally approximate an optimal control rule, or policy, often starting with incomplete information about the dynamics of its environment. Although these problems have been studied intensively for many years, the methods being developed by reinforcement learning researchers are adding some novel elements to classical dynamic programming solution methods. This article provides a brief account of these methods, explains what is novel about them, and suggests what their advantages might be over classical applications of dynamic programming to large-scale stochastic optimal control problems.

Full Text