Multi-armed Bandit Research Articles

Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in healthcare and the tech industry. They involve online learning algorithms that adaptively learn policies over time to map observed contexts X t to actions A t in an attempt to maximize stochastic rewards R t . This adaptivity raises interesting but hard statistical inference questions, especially counterfactual ones: for example, it is often of interest to estimate the properties of a hypothetical policy that is different from the logging policy that was used to collect the data—a problem known as “off-policy evaluation” (OPE). Using modern martingale techniques, we present a comprehensive framework for OPE inference that relaxes unnecessary conditions made in some past works (such as performing inference at prespecified sample sizes, uniformly bounded importance weights, constant logging policies, and constant policy values, among others), significantly improving on them both theoretically and empirically. Importantly, our methods can be employed while the original experiment is still running (that is, not necessarily post hoc), when the logging policy may be itself changing (due to learning), and even if the context distributions are a highly dependent time series (such as if they are drifting over time). More concretely, we derive confidence sequences for various functionals of interest in OPE. These include doubly robust ones for time-varying off-policy mean reward values, but also confidence bands for the entire cumulative distribution function of the off-policy reward distribution. All of our methods (a) are valid at arbitrary stopping times; (b) only make nonparametric assumptions; (c) do not require importance weights to be uniformly bounded, and if they are, we do not need to know these bounds; and (d) adapt to the empirical variance of our estimators. In summary, our methods enable anytime-valid off-policy inference using adaptively collected contextual bandit data.

BackgroundMicrobiome dysbiosis has recently been associated with different diseases and disorders. In this context, machine learning (ML) approaches can be useful either to identify new patterns or learn predictive models. However, data to be fed to ML methods can be subject to different sampling, sequencing and preprocessing techniques. Each different choice in the pipeline can lead to a different view (i.e., feature set) of the same individuals, that classical (single-view) ML approaches may fail to simultaneously consider. Moreover, some views may be incomplete, i.e., some individuals may be missing in some views, possibly due to the absence of some measurements or to the fact that some features are not available/applicable for all the individuals. Multi-view learning methods can represent a possible solution to consider multiple feature sets for the same individuals, but most existing multi-view learning methods are limited to binary classification tasks or cannot work with incomplete views.ResultsWe propose irBoost.SH, an extension of the multi-view boosting algorithm rBoost.SH, based on multi-armed bandits. irBoost.SH solves multi-class classification tasks and can analyze incomplete views. At each iteration, it identifies one winning view using adversarial multi-armed bandits and uses its predictions to update a shared instance weight distribution in a learning process based on boosting. In our experiments, performed on 5 multi-view microbiome datasets, the model learned by irBoost.SH always outperforms the best model learned from a single view, its closest competitor rBoost.SH, and the model learned by a multi-view approach based on feature concatenation, reaching an improvement of 11.8% of the F1-score in the prediction of the Autism Spectrum disorder and of 114% in the prediction of the Colorectal Cancer disease.ConclusionsThe proposed method irBoost.SH exhibited outstanding performances in our experiments, also compared to competitor approaches. The obtained results confirm that irBoost.SH can fruitfully be adopted for the analysis of microbiome data, due to its capability to simultaneously exploit multiple feature sets obtained through different sequencing and preprocessing pipelines.

Multi-armed Bandit Research Articles

Related Topics

Articles published on Multi-armed Bandit

Enhancing movie recommendations through comparative analysis of UCB algorithm variants

Selecting workers like expert for crowdsourcing by integration evaluation of individual and collaborative abilities

Interactive preference analysis: A reinforcement learning framework

Multi-armed bandit approach for mean field game-based resource allocation in NOMA networks

Cache content placement in the presence of fictitious requests in mmWave 5G IAB networks

Exploration–Exploitation Mechanisms in Recurrent Neural Networks and Human Learners in Restless Bandit Problems

Transmission scheduling of P2P real-time communication based on restless multi-armed bandit

Corruption-Robust Exploration in Episodic Reinforcement Learning

Client selection for federated learning using combinatorial multi-armed bandit under long-term energy constraint

Anytime-valid off-policy Inference for Contextual Bandits

Personalized Image Generation Through Swiping

A new Hyper-heuristic based on Adaptive Simulated Annealing and Reinforcement Learning for the Capacitated Electric Vehicle Routing Problem

Exploring Multi-Armed Bandit (MAB) as an AI Tool for Optimising GMA-WAAM Path Planning

Multi‐armed bandit based online model selection for concept‐drift adaptation

Multi-class boosting for the analysis of multiple incomplete views on microbiome data

Bandit approach for unmanned aerial vehicle-centric low earth orbit satellite selection

Approximate information for efficient exploration-exploitation strategies.

An algorithm for multi-armed bandit based on variance change sensitivity

In-depth Exploration and Implementation of Multi-Armed Bandit Models Across Diverse Fields

Enhancing UCB-tuned and Asymptotically Optimal UCB Algorithms through Weighted Average Techniques in Multi-Armed Bandit Scenarios

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Multi-armed Bandit Research Articles

Related Topics

Articles published on Multi-armed Bandit

Enhancing movie recommendations through comparative analysis of UCB algorithm variants

Selecting workers like expert for crowdsourcing by integration evaluation of individual and collaborative abilities

Interactive preference analysis: A reinforcement learning framework

Multi-armed bandit approach for mean field game-based resource allocation in NOMA networks

Cache content placement in the presence of fictitious requests in mmWave 5G IAB networks

Exploration–Exploitation Mechanisms in Recurrent Neural Networks and Human Learners in Restless Bandit Problems

Transmission scheduling of P2P real-time communication based on restless multi-armed bandit

Corruption-Robust Exploration in Episodic Reinforcement Learning

Client selection for federated learning using combinatorial multi-armed bandit under long-term energy constraint

Anytime-valid off-policy Inference for Contextual Bandits

Personalized Image Generation Through Swiping

A new Hyper-heuristic based on Adaptive Simulated Annealing and Reinforcement Learning for the Capacitated Electric Vehicle Routing Problem

Exploring Multi-Armed Bandit (MAB) as an AI Tool for Optimising GMA-WAAM Path Planning

Multi‐armed bandit based online model selection for concept‐drift adaptation

Multi-class boosting for the analysis of multiple incomplete views on microbiome data

Bandit approach for unmanned aerial vehicle-centric low earth orbit satellite selection

Approximate information for efficient exploration-exploitation strategies.

An algorithm for multi-armed bandit based on variance change sensitivity

In-depth Exploration and Implementation of Multi-Armed Bandit Models Across Diverse Fields

Enhancing UCB-tuned and Asymptotically Optimal UCB Algorithms through Weighted Average Techniques in Multi-Armed Bandit Scenarios