Dynamic Programming, Numerical

John Rust

doi:10.1002/9781118445112.stat07921

Abstract

Abstract This article reviews a large literature on numerical methods for finding approximate optimal or equilibrium solutions to sequential decision processes and dynamic games using the technique of dynamic programming, the name Bellman gave to a recursive procedure for solving complex decision problems through the process of backward induction. The main challenge is a problem Bellman called the curse of dimensionality , which is the exponential increase in computer power needed to solve increasingly complex, realistic dynamic programs. The article reviews the literature on information‐based complexity that has enabled computer scientists to rigorously formalize the curse of dimensionality in terms of worst case complexity bounds that increase exponentially in the number of continuous state and control variables entering the dynamic programming problem. For the most general classes of dynamic programming problems, this literature demonstrates that the curse of dimensionality is a fundamental problem that cannot be circumvented by any algorithm, no matter how clever. However, there are subclasses of dynamic programming problems that have special structure that can be exploited to enable us to break the curse of dimensionality in some cases, sometimes via randomized algorithms similar to Monte Carlo integration. The disadvantage of randomized algorithms is the stochastic noise in the approximate solutions, although this “noise” can be made as small as desired at the cost of higher computational effort. The article reviews a large literature on deterministic algorithms for solving finite and infinite horizon dynamic programming problems that are used in practice to provide accurate solutions to low‐to‐moderate dimensional problems. The article reviews a more recent literature called approximate dynamic programming (or neurodynamic programming ) that have been proposed to approximately solve more challenging high‐dimensional problems. These are typically iterative, stochastic methods that are inspired from the literature on artificial intelligence and reinforcement learning that include a method called Q‐learning , which can also be described as “model free” as it relies on methods of “training” from the reinforcement learning literature that only require simulated realizations of the controlled stochastic process rather than an explicit representation of this process (as a transition probability, for example). These methods were originally developed for problems with finite state and action spaces but have been extended to problems with continuous state and action spaces (or very many finite states and actions) using function approximation and nonlinear regression techniques including neural networks, hence the name neurodynamic programming. Although the analysis of the convergence of these types of methods to the true solution is incomplete and computable bounds between approximate and true solutions are often unavailable, there have been some impressive successful applications of these techniques including most recently Google's development of the AlphaGo program, which has defeated the world's best human player in the game of Go, a complex game whose number of possible board positions exceeds the number of atoms in the universe.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Dynamic Programming, Numerical

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Heuristic Dynamic Programming Nonlinear Optimal Controller
...
-
, et. al. ...
01 Jan 2009
01 Jan 2009

DISCRETE DYNAMIC PROGRAMMING WITH RECURSIVE ADDITIVE SYSTEM
Seiichi Iwamoto
Bulletin of Mathematical Statistics | VOL. 16
Seiichi IwamotoSeiichi Iwamoto
01 Mar 1974
Bulletin of Mathematical Statistics | VOL. 16

Perspectives of approximate dynamic programming
Warren B Powell
Annals of Operations Research | VOL. 241
Warren B PowellWarren B Powell
07 Feb 2012
Annals of Operations Research | VOL. 241

Approximate Dynamic Programming
Warren B Powell
-
Warren B PowellWarren B Powell
04 Aug 2011
04 Aug 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dynamic Programming, Numerical

Abstract

Talk to us

Similar Papers