Abstract

In this paper, a mapping is developed between the ‘multichain’ and ‘unchain’ linear programs for average reward Markov decision processes (MDPs) with multiple constraints on average expected costs. Our approach applies the communicating properties of MDPs. The mapping is used not only to prove that the unichain linear program solves the average reward communicating MDPs with multiple constraints on average expected costs, but also to demonstrate that the optimal gain for the communicating MDPs with multiple constraints on average expected costs is constant.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call