A Low-Rank Approximation for MDPs via Moment Coupling

Amy B Z Zhang,Itai Gurvich

doi:10.1287/opre.2022.2392

Abstract

Markov Decision Process Tayloring for Approximation Design Optimal control problems are difficult to solve for problems on large state spaces, calling for the development of approximate solution methods. In “A Low-rank Approximation for MDPs via Moment Coupling,” Zhang and Gurvich introduce a novel framework to approximate Markov decision processes (MDPs) that stands on two pillars: (i) state aggregation, as the algorithmic infrastructure, and (ii) central-limit-theorem-type approximations, as the mathematical underpinning. The theoretical guarantees are grounded in the approximation of the Bellman equation by a partial differential equation (PDE) where, in the spirit of the central limit theorem, the transition matrix of the controlled Markov chain is reduced to its local first and second moments. Instead of solving the PDE, the algorithm introduced in the paper constructs a “sister”' (controlled) Markov chain whose two local transition moments are approximately identical with those of the focal chain. Because of this moment matching, the original chain and its sister are coupled through the PDE, facilitating optimality guarantees. Embedded into standard soft aggregation, moment matching provides a disciplined mechanism to tune the aggregation and disaggregation probabilities.

Full Text