Abstract

An experiment is considered which consists of a series of trials, where at each trial one of two treatments must be used, the outcome, 'success' or 'failure', being known immediately; this is often referred to as a 'two-armed bandit'. It is required to find a rule for choosing a treatment at each trial which meets, as far as possible, two objectives: (a) to maximize the use of the better treatment, and (b) to minimize the probability of wrongly identifying the better treatment at the end of the experiment. A number of such rules are compared using computer simulation and it is found that an easy-to-use rule based on a dynamic allocation index performs well for a wide range of model parameters.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call