Abstract

Markov Decision Processes (MDPs) and their generalization, Partially Observable MDPs (POMDPs), have been widely studied and used as invaluable tools in dynamic stochastic decision-making. However, two major barriers have limited their application for problems arising in various practical settings: (a) computational challenges for problems with large state or action spaces, and (b) ambiguity in transition probabilities, which are typically hard to quantify. While several solutions for the first challenge, known as curse of dimensionality, have been proposed, the second challenge remains unsolved and even untouched in the case of POMDPs. We refer to the second challenge as the curse of ambiguity, and address it by developing a generalization of POMDPs termed Ambiguous POMDPs (APOMDPs). The proposed generalization not only allows the decision maker to take into account imperfect state information, but also tackles the inevitable ambiguity with respect to the correct probabilistic model. Importantly, this paper extends various structural results from POMDPs to APOMDPs. Such structural results can guide the decision maker to make robust decisions when facing model ambiguity. Robustness is achieved by using α-maximin expected utility (α -MEU), which (a) differentiates between ambiguity and ambiguity attitude, (b) avoids the over conservativeness of traditional maximin approaches widely used in robust optimization, and (c) is found to be suitable in laboratory experiments in various choice behaviors including those in portfolio selection. The structural results provided also help to handle the curse of dimensionality, since they significantly simplify the search for an optimal policy. Furthermore, we provide an analytical performance guarantee for the APOMDP approach by developing a bound for its maximum reward loss due to model ambiguity. To generate further insights into how APOMDPs can help to make better decisions, we also discuss specific applications of APOMDPs including machine replacement, medical decision-making, inventory control, revenue management, optimal search, sequential design of experiments, bandit problems, and dynamic principal-agent models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call