Abstract

The connection between optimal stopping times of American Options and multi-armed bandits is the subject of active research. This article investigates the effects of optional stopping in a particular class of multi-armed bandit experiments, which randomly allocates observations to arms proportional to the Bayesian posterior probability that each arm is optimal (Thompson sampling). The interplay between optional stopping and prior mismatch is examined. We propose a novel partitioning of regret into peri/post testing. We further show a strong dependence of the parameters of interest on the assumed prior probability density.

Highlights

  • Sequential testing procedures allow an experiment to stop early, once the available/streaming data collected is sufficient to make a conclusion

  • This paper expands upon prior work (Loecher, 2017) and focuses on the effects of optional stopping on the multi-armed bandit (MAB) procedure outlined by (Scott, 2010; Scott, 2012)

  • Our work touches upon previous research on American Options (Bank and Föllmer, 2003) that demonstrates a close connection between multi-armed bandits and optimal stopping times in terms of the Snell envelope of the given payoff process

Read more

Summary

INTRODUCTION

Sequential testing procedures allow an experiment to stop early, once the available/streaming data collected is sufficient to make a conclusion. In order to reduce the dependence on the alternate hypothesis, Kulldorff et al (Kulldorff et al, 2011) introduced the use of a maximized sequential probability ratio test (MaxSPRT), where the alternative hypothesis is composite rather than simple. This paper expands upon prior work (Loecher, 2017) and focuses on the effects of optional stopping on the multi-armed bandit (MAB) procedure outlined by (Scott, 2010; Scott, 2012). Our work touches upon previous research on American Options (Bank and Föllmer, 2003) that demonstrates a close connection between multi-armed bandits and optimal stopping times in terms of the Snell envelope of the given payoff process

MULTI-ARMED BANDITS
Randomized Probability Matching
Regret
Choice of Priors
INFINITE EXPERIMENTS
STOPPED EXPERIMENTS
Calibration
Accuracy and ASN
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call