Application of multi-armed bandits to dose-finding clinical designs

Masahiro Kojima

doi:10.1016/j.artmed.2023.102713

Abstract

Multi-armed bandits are very simple and powerful methods to determine actions to maximize a reward in a limited number of trials. An early phase in dose-finding clinical trials needs to identify the maximum tolerated dose among multiple doses by repeating the dose-assignment. We consider applying the superior selection performance of multi-armed bandits to dose-finding clinical designs. Among the multi-armed bandits, we first consider the use of Thompson sampling which determines actions based on random samples from a posterior distribution. In the small sample size, as shown in dose-finding trials, because the tails of posterior distribution are heavier and random samples are too much variability, we also consider an application of regularized Thompson sampling and greedy algorithm. The greedy algorithm determines a dose based on a posterior mean. In addition, we also propose a method to determine a dose based on a posterior mode. We evaluate the performance of our proposed designs for nine scenarios via simulation studies.

Full Text