Good arm identification via bandit feedback

Hideaki Kano,Kentaro Sakamaki,Kentaro Matsuura,Atsuyoshi Nakamura,Masashi Sugiyama,Junya Honda

doi:10.1007/s10994-019-05784-4

Hideaki Kano, Kentaro Sakamaki + Show 4 more

Open Access

https://doi.org/10.1007/s10994-019-05784-4

Copy DOI

Abstract

We consider a novel stochastic multi-armed bandit problem called good arm identification (GAI), where a good arm is defined as an arm with expected reward greater than or equal to a given threshold. GAI is a pure-exploration problem in which a single agent repeats a process of outputting an arm as soon as it is identified as a good one before confirming the other arms are actually not good. The objective of GAI is to minimize the number of samples for each process. We find that GAI faces a new kind of dilemma, the exploration-exploitation dilemma of confidence, which is different from the best arm identification. As a result, an efficient design of algorithms for GAI is quite different from that for the best arm identification. We derive a lower bound on the sample complexity of GAI that is tight up to the logarithmic factor mathrm {O}(log frac{1}{delta }) for acceptance error rate delta . We also develop an algorithm whose sample complexity almost matches the lower bound. We also confirm experimentally that our proposed algorithm outperforms naive algorithms in synthetic settings based on a conventional bandit problem and clinical trial researches for rheumatoid arthritis.

Full Text