Abstract
The multi-armed bandit problem is a classic example of the exploration-exploitation trade-off well suited to model sequential resource allocation under uncertainty. One of its typical motivating applications is the adaptive designs in clinical trials which modify the trial's course in accordance with the pre-specified objective by utilizing results accumulating in the trial. Since the response to a procedure in clinical trials is not immediate, the multi-armed bandit policies require adaptation to delays to retain their theoretical guarantees. In this work, we show the importance of such adaptation by evaluating policies using the publicly available datasetThe International Stroke Trial of a randomized trial of aspirin and subcutaneous heparin among 19,435 patients with acute ischaemic stroke. In addition to adapted policies, we analyze the Upper Confidence Bound policy with the beta feedback to mitigate delays when the certainty evidence of successful treatment is available in a relatively short-term period after the procedure.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.