Reversible Markov Decision Processes with an Average-Reward Criterion

Randy Cogill,Cheng Peng

doi:10.1137/110844957

Abstract

A wide variety of stochastic models have an important property known as reversibility. The analysis of reversible Markov chains is often significantly simpler than analysis of general Markov chains, particularly since there are often simple closed-form expressions for their invariant probability mass functions. In this paper we study the structure of optimal control policies for Markov decision processes with reversible dynamics. For optimal control of Markov decision processes, significant analytical and computational simplifications can arise as a result of reversibility. Here we present an analysis of a special class of reversible Markov decision processes, namely those that contain a cycle through their communication graph under any policy. These results provide encouraging evidence that similar simplifications hold for more general models, such as reversible processes without cycles and partition reversible processes. These results are demonstrated on a control-based variant of the Metropolis--Hastin...

Full Text