We investigate the optimal allocation design for response adaptive clinical trials, under the average reward criterion. The treatment randomization process is formatted as a Markov decision process and the Bayesian method is used to summarize the information on treatment effects. A span-contraction operator is introduced and the average reward generated by the policy identified by the operator is shown to converge to the optimal value. We propose an algorithm to approximate the optimal treatment allocation using the Thompson sampling and the contraction operator. For the scenario of two treatments with binary responses and a sample size of 200 patients, simulation results demonstrate efficient learning features of the proposed method. It allocates a high proportion of patients to the better treatment while retaining a good statistical power and having a small probability for a trial going in the undesired direction. When the difference in success probability to detect is 0.2, the probability for a trial going in the unfavorable direction is < 1.5%, which decreases further to < 0.9% when the difference to detect is 0.3. For normally distribution responses, with a sample size of 100 patients, the proposed method assigns 13% more patients to the better treatment than the traditional complete randomization in detecting an effect size of difference 0.8, with a good statistical power and a < 0.7% probability for the trial to go in the undesired direction.
Read full abstract