A Comparison of Policy Iteration Methods for Solving Continuous-State, Infinite-Horizon Markovian Decision Problems Using Random, Quasi-random, and Deterministic Discretizations

John P Rust

doi:10.2139/ssrn.37768

Abstract

This paper compares the performance of the Howard (1960) policy iteration for infinite-horizon continuous-state Markovian decision processes (MDP's) using alternative random, quasi-random, and deterministic discretizations of the state space, or grids. Each grid corresponds to an embedded finite state Markovian decision process whose solution is used to approximate the solution to the original continuous-state Markovian decision process. Extending a result of Rust (1997), I show that policy iteration using succeeds in breaking the of dimensionality involved in approximating the solution to a class of continuous-state discrete-action MDP's known as discrete decision processes (DDP's). I compare this random policy iteration algorithm (RPI) with policy iteration algorithms using deterministically chosen including uniform and grids both of which are subject to the curse of dimensionality. I also compare the RPI to deterministic policy iteration algorithms based on quasi-random or discrepancy grids such as the Sobol and Tezuka sequences. While an analysis of the worst case computational complexity of the DDP problem shows that any deterministic solution method is subject to an inherent of dimensionality, my numerical comparisons reveal that in the test problems considered, policy iteration using the deterministic, low discrepancy were superior to the RPI algorithm. The RPI in turn, outperformed deterministic policy iteration using methods based on uniform and quadrature even in one and two dimensional test problems when the transition density in the MDP problem is sufficiently smooth, but can be inferior to the latter methods in problems where the transition density has large discontinuities or spikes, violating regularity conditions needed to establish the uniform convergence of the RPI algorithm. This finding suggests that policy iteration algorithms that use low discrepancy may succeed in breaking the of dimensionality in average case settings, since in multivariate problems the rate of convergence of these methods exceeds the rate of convergence of methods based on and other deterministically chosen grids, and thus tends to outperform these methods in problems where there are no large spikes or discontinuities in the transition density of the MDP problem.

Full Text