Abstract

Abstract This article reviews a large literature on numerical methods for finding approximate optimal or equilibrium solutions to sequential decision processes and dynamic games using the technique of dynamic programming, the name Bellman gave to a recursive procedure for solving complex decision problems through the process of backward induction. The main challenge is a problem Bellman called the curse of dimensionality , which is the exponential increase in computer power needed to solve increasingly complex, realistic dynamic programs. The article reviews the literature on information‐based complexity that has enabled computer scientists to rigorously formalize the curse of dimensionality in terms of worst case complexity bounds that increase exponentially in the number of continuous state and control variables entering the dynamic programming problem. For the most general classes of dynamic programming problems, this literature demonstrates that the curse of dimensionality is a fundamental problem that cannot be circumvented by any algorithm, no matter how clever. However, there are subclasses of dynamic programming problems that have special structure that can be exploited to enable us to break the curse of dimensionality in some cases, sometimes via randomized algorithms similar to Monte Carlo integration. The disadvantage of randomized algorithms is the stochastic noise in the approximate solutions, although this “noise” can be made as small as desired at the cost of higher computational effort. The article reviews a large literature on deterministic algorithms for solving finite and infinite horizon dynamic programming problems that are used in practice to provide accurate solutions to low‐to‐moderate dimensional problems. The article reviews a more recent literature called approximate dynamic programming (or neurodynamic programming ) that have been proposed to approximately solve more challenging high‐dimensional problems. These are typically iterative, stochastic methods that are inspired from the literature on artificial intelligence and reinforcement learning that include a method called Q‐learning , which can also be described as “model free” as it relies on methods of “training” from the reinforcement learning literature that only require simulated realizations of the controlled stochastic process rather than an explicit representation of this process (as a transition probability, for example). These methods were originally developed for problems with finite state and action spaces but have been extended to problems with continuous state and action spaces (or very many finite states and actions) using function approximation and nonlinear regression techniques including neural networks, hence the name neurodynamic programming. Although the analysis of the convergence of these types of methods to the true solution is incomplete and computable bounds between approximate and true solutions are often unavailable, there have been some impressive successful applications of these techniques including most recently Google's development of the AlphaGo program, which has defeated the world's best human player in the game of Go, a complex game whose number of possible board positions exceeds the number of atoms in the universe.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call