Abstract

This paper considers a multiple stopping problem on a Hidden Markov model sample path of infinite horizon; where a reward, dependent on the underlying state, is associated with each stop. The decision maker stops L times to maximize the total expected revenue. The aim is to determine the structure of the optimal multiple stopping policy. The formulation generalizes the classical (single) stopping time Partially Observed Markov Decision (POMDP) problem. Even though the stopping set (in terms of the Bayesian beliefs) is not necessarily convex, we show that is a connected set. The structural results are illustrated using a numerical example.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call