The Perils of Misspecified Priors and Optional Stopping in Multi-Armed Bandits.

Markus Loecher

doi:10.3389/frai.2021.715690

Abstract

The connection between optimal stopping times of American Options and multi-armed bandits is the subject of active research. This article investigates the effects of optional stopping in a particular class of multi-armed bandit experiments, which randomly allocates observations to arms proportional to the Bayesian posterior probability that each arm is optimal (Thompson sampling). The interplay between optional stopping and prior mismatch is examined. We propose a novel partitioning of regret into peri/post testing. We further show a strong dependence of the parameters of interest on the assumed prior probability density.

Highlights

Sequential testing procedures allow an experiment to stop early, once the available/streaming data collected is sufficient to make a conclusion
This paper expands upon prior work (Loecher, 2017) and focuses on the effects of optional stopping on the multi-armed bandit (MAB) procedure outlined by (Scott, 2010; Scott, 2012)
Our work touches upon previous research on American Options (Bank and Föllmer, 2003) that demonstrates a close connection between multi-armed bandits and optimal stopping times in terms of the Snell envelope of the given payoff process

Summary

INTRODUCTION

Sequential testing procedures allow an experiment to stop early, once the available/streaming data collected is sufficient to make a conclusion. In order to reduce the dependence on the alternate hypothesis, Kulldorff et al (Kulldorff et al, 2011) introduced the use of a maximized sequential probability ratio test (MaxSPRT), where the alternative hypothesis is composite rather than simple. This paper expands upon prior work (Loecher, 2017) and focuses on the effects of optional stopping on the multi-armed bandit (MAB) procedure outlined by (Scott, 2010; Scott, 2012). Our work touches upon previous research on American Options (Bank and Föllmer, 2003) that demonstrates a close connection between multi-armed bandits and optimal stopping times in terms of the Snell envelope of the given payoff process

MULTI-ARMED BANDITS

Randomized Probability Matching

Regret

Choice of Priors

INFINITE EXPERIMENTS

STOPPED EXPERIMENTS

Calibration

Accuracy and ASN

Findings

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in artificial intelligence	Publication Date: Jul 9, 2021
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

The Perils of Misspecified Priors and Optional Stopping in Multi-Armed Bandits.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in artificial intelligence

Lead the way for us

Similar Papers

A note on the advantage of context in Thompson sampling
Michael Byrd ... Ross Darrow
-
Michael Byrd, et. al.Michael Byrd ... Ross Darrow
01 Jan 2023
01 Jan 2023

A note on the advantage of context in Thompson sampling
Michael Byrd ... Ross Darrow
Journal of Revenue and Pricing Management | VOL. 20
Michael Byrd, et. al.Michael Byrd ... Ross Darrow
24 Mar 2021
Journal of Revenue and Pricing Management | VOL. 20

Thompson Sampling for Dynamic Multi-armed Bandits
Neha Gupta ... Ole-Christoffer Granmo
-
Neha Gupta, et. al.Neha Gupta ... Ole-Christoffer Granmo
01 Dec 2011
01 Dec 2011

In-depth Exploration and Implementation of Multi-Armed Bandit Models Across Diverse Fields
Jiazhen Wu
Highlights in Science, Engineering and Technology | VOL. 94
Jiazhen WuJiazhen Wu
26 Apr 2024
Highlights in Science, Engineering and Technology | VOL. 94

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Perils of Misspecified Priors and Optional Stopping in Multi-Armed Bandits.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in artificial intelligence