Robust Markov Decision Process Research Articles

Markov decision process (MDP) models for the optimal time to initial a medical therapy, such as an organ transplantation, require the estimation of health state transition probabilities from physiological data. Such estimation may be a source of probabilistic ambiguity when, for example, some critical health states are seldom visited historically. For MDP models in general, robust dynamic programming has been proposed as an approach to mitigate the effects of ambiguity on optimal decisions. However, very few realworld studies examining the usefulness of robust MDP policies have been reported. We present a robust MDP model for medical therapy initiation in which worst-case transition probabilities are chosen from a set of probability measures constructed using relative entropy bounds. For this model, we prove that therapy is initiated sooner, in additional states, as the ambiguity increases. We apply the methodology to the problem of deciding when to undergo a living-donor liver transplantation, and present the results of a case study using clinical data. We propose a novel implied confidence level measure that maps the robust solutions to historical transplant decisions, and find that in some cases the robust policies are closer to decisions that have been made in practice.

Read full abstract

In a stochastic interdiction game a proliferator aims to minimize the expected duration of a nuclear weapons development project, and an interdictor endeavors to maximize the project duration by delaying some of the project tasks. We formulate static and dynamic versions of the interdictor’s decision problem where the interdiction plan is either precommitted or adapts to new information revealed over time, respectively. The static model gives rise to a stochastic program, whereas the dynamic model is formalized as a multiple optimal stopping problem in continuous time and with decision-dependent information. Under a memoryless probabilistic model for the task durations, we prove that the static model reduces to a mixed-integer linear program, whereas the dynamic model reduces to a finite Markov decision process in discrete time that can be solved via efficient value iteration. We then generalize the dynamic model to account for uncertainty in the outcomes of the interdiction actions. We also discuss a crashing game where the proliferator can use limited resources to expedite tasks so as to counterbalance the interdictor’s efforts. The resulting problem can be formulated as a robust Markov decision process. This paper was accepted by Dimitris Bertsimas, optimization.

Read full abstract

Robust Markov Decision Process Research Articles

Articles published on Robust Markov Decision Process

Approximation of Discounted Minimax Markov Control Problems and Zero-Sum Markov Games Using Hausdorff and Wasserstein Distances

Optimal Threshold Policies for Robust Data Center Control

Living-Donor Liver Transplantation Timing under Ambiguous Health State Transition Probabilities

A Convex Optimization Approach to Distributionally Robust Markov Decision Processes With Wasserstein Distance

Reinforcement Learning in Robust Markov Decision Processes

Interdiction Games on Markovian PERT Networks

Robust decomposable Markov decision processes motivated by allocating school budgets

Robust Markov Decision Processes

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Robust Markov Decision Process Research Articles

Articles published on Robust Markov Decision Process

Approximation of Discounted Minimax Markov Control Problems and Zero-Sum Markov Games Using Hausdorff and Wasserstein Distances

Optimal Threshold Policies for Robust Data Center Control

Living-Donor Liver Transplantation Timing under Ambiguous Health State Transition Probabilities

A Convex Optimization Approach to Distributionally Robust Markov Decision Processes With Wasserstein Distance

Reinforcement Learning in Robust Markov Decision Processes

Interdiction Games on Markovian PERT Networks

Robust decomposable Markov decision processes motivated by allocating school budgets

Robust Markov Decision Processes