Empirical Dynamic Programming

William B Haskell,Dileep Kalathil,Rahul Jain

doi:10.1287/moor.2015.0733

Abstract

We propose empirical dynamic programming algorithms for Markov decision processes. In these algorithms, the exact expectation in the Bellman operator in classical value iteration is replaced by an empirical estimate to get “empirical value iteration” (EVI). Policy evaluation and policy improvement in classical policy iteration are also replaced by simulation to get “empirical policy iteration” (EPI). Thus, these empirical dynamic programming algorithms involve iteration of a random operator, the empirical Bellman operator. We introduce notions of probabilistic fixed points for such random monotone operators. We develop a stochastic dominance framework for convergence analysis of such operators. We then use this to give sample complexity bounds for both EVI and EPI. We then provide various variations and extensions to asynchronous empirical dynamic programming, the minimax empirical dynamic program, and show how this can also be used to solve the dynamic newsvendor problem. Preliminary experimental results suggest a faster rate of convergence than stochastic approximation algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Empirical Dynamic Programming

Abstract

Talk to us

Similar Papers

More From: Mathematics of Operations Research

Lead the way for us

Journal: Mathematics of Operations Research	Publication Date: May 1, 2016
Citations: 92

Similar Papers

A Simulation-Based Policy Iteration Algorithm for Average Cost Unichain Markov Decision Processes
Ying He ... Michael C Fu
-
Ying He, et. al.Ying He ... Michael C Fu
01 Jan 1999
01 Jan 1999

Efficient Algorithms for Budget-Constrained Markov Decision Processes
Constantine Caramanis ... Nedialko B Dimitrov
IEEE Transactions on Automatic Control | VOL. 59
Constantine Caramanis, et. al.Constantine Caramanis ... Nedialko B Dimitrov
01 Oct 2014
IEEE Transactions on Automatic Control | VOL. 59

Model-Free λ-Policy Iteration for Discrete-Time Linear Quadratic Regulation.
Yongliang Yang ... Chengzhong Xu
IEEE transactions on neural networks and learning systems | VOL. 34
Yongliang Yang, et. al.Yongliang Yang ... Chengzhong Xu
01 Feb 2023
IEEE transactions on neural networks and learning systems | VOL. 34

A Mixed Value and Policy Iteration Method for Stochastic Control with Universally Measurable Policies
Huizhen Yu ... Dimitri P Bertsekas
Mathematics of Operations Research | VOL. 40
Huizhen Yu, et. al.Huizhen Yu ... Dimitri P Bertsekas
01 Oct 2015
Mathematics of Operations Research | VOL. 40

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Empirical Dynamic Programming

Abstract

Talk to us

Similar Papers

More From: Mathematics of Operations Research