Abstract

In the classical optimal stopping problem, a player is given a sequence of random variables $X_1\ldots X_n$ with known distributions. After observing the realization of $X_i$, the player can either accept the observed reward from $X_i$ and stop, or reject the observed reward from $X_i$ and continue to observe the next variable $X_{i+1}$ in the sequence. Under any fixed ordering of the random variables, an optimal stopping policy, one that maximizes the player's expected reward, is given by the solution of a simple dynamic program. In this paper, we investigate the relatively less studied question of selecting the order in which the random variables should be observed so as to maximize the expected reward at the stopping time. To demonstrate the benefits of order selection, we prove a novel prophet inequality showing that, when the support of each random variable has size at most 2, the optimal ordering can achieve an expected reward that is within a factor of 1.25 of the expected hindsight maximum; this is an improvement over the corresponding factor of 2 for the worst-case ordering. We also provide a simple $O(n^2)$ algorithm for finding an optimal ordering in this case. Perhaps surprisingly, we demonstrate that a slightly more general case - each random variable $X_i$ is restricted to have 3-point support of form $\{0, m_i, 1\}$ - is NP-hard, and provide an FPTAS for that case.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.