Consider the following sequential sampling problem: at each time, a choice must be made between obtaining an independent sample from one of a set of random reward variables or stopping the sampling. Sampling a random variable incurs a random cost at each time. The objective of the problem is to maximize the expected nett difference between the largest sample reward obtained before stopping and the accumulated costs incurred while sampling. In this paper, the authors prove that the optimal feedback strategies for this problem are index policies and provide an explicit expression for the optimal expected reward from any state. The problem is motivated by search methods for global optimization problems where the cost of computation is explicitly incorporated into the objective.
Read full abstract