Abstract

This paper investigates monotonicity properties of optimal policies for two-action partially observable Markov decision processes when the underlying (core) state and observation spaces are partially ordered. Motivated by the desirable properties of the monotone likelihood ratio order in imperfect information settings, namely the preservation of belief ordering under conditioning on any new information, we propose a new stochastic order (a generalization of the monotone likelihood ratio order) that is appropriate for when the underlying space is partially ordered. The generalization is non-trivial, requiring one to impose additional conditions on the elements of the beliefs corresponding to incomparable pairs of states. The stricter conditions in the proposed stochastic order reflect a conservation of structure in the problem – the loss of structure from relaxing the total ordering of the state space to a partial order requires stronger conditions with respect to the ordering of beliefs. In addition to the proposed stochastic order, we introduce a class of matrices, termed generalized totally positive of order 2, that are sufficient for preserving the order. Our main result is a set of sufficient conditions that ensures existence of an optimal policy that is monotone on the belief space with respect to the proposed stochastic order.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call