Bandit Algorithm Research Articles

As the frequency of natural disasters increases, the study of emergency communication becomes increasingly important. The use of federated learning (FL) in this scenario can facilitate communication collaboration between devices while protecting privacy, greatly improving system performance. Considering the complex geographic environment, the flexible mobility and large communication radius of unmanned aerial vehicles (UAVs) make them ideal auxiliary devices for wireless communication. Using the UAV as a mobile base station can better provide stable communication signals. However, the number of ground-based IoT terminals is large and closely distributed, so if all of them transmit data to the UAV, the UAV will not be able to take on all of the computation and communication tasks because of its limited energy. In addition, there is competition for spectrum resources among many terrestrial devices, and all devices transmitting data will bring about an extreme shortage of resources, which will lead to the degradation of model performance. This will bring indelible damage to the rescue of the disaster area and greatly threaten the life safety of the vulnerable and injured. Therefore, we use user scheduling to select some terrestrial devices to participate in the FL process. In order to avoid the resource waste generated by the terrestrial device resource prediction, we use the multi-armed bandit (MAB) algorithm for equipment evaluation. Considering the fairness issue of selection, we try to replace the single criterion with multiple criteria, using model freshness and energy consumption weighting as reward functions. The state of the art of our approach is demonstrated by simulations on the datasets.

Read full abstract

The Multi-armed Bandit algorithm, a proficient solver of the exploration-and-exploitation trade-off predicament, furnishes businesses with a robust tool for resource allocation that predominantly aligns with customer preferences. However, varying Multi-armed Bandit algorithm types exhibit dissimilar performance characteristics based on contextual variations. Hence, a series of experiments is imperative, involving alterations to input values across distinct algorithms. Within this study, three specific algorithms were applied, Explore-then-commit (ETC), Upper Confident Bound (UCB) and its asymptotically optimal variant, and Thompson Sampling (TS), to the extensively utilized MovieLens dataset. This application aimed to gauge their effectiveness comprehensively. The algorithms were translated into executable code, and their performance was visually depicted through multiple figures. Through cumulative regret tracking within defined conditions, algorithmic performance was scrutinized, laying the groundwork for subsequent parameter-based comparisons. A dedicated experimentation framework was devised to evaluate the robustness of each algorithm, involving deliberate parameter adjustments and tailored experiments to elucidate distinct performance nuances. The ensuing graphical depictions distinctly illustrated Thompson Sampling's persistent minimal regrets across most scenarios. UCB algorithms displayed steadfast stability. ETC manifested excellent performance with a low number of runs but escalate significantly along the number of runs growing. It also warranting constraints on exploratory phases to mitigate regrets. This investigation underscores the efficacy of Multi-armed Bandit algorithms while elucidating their nuanced behaviors within diverse contextual contingencies.

Read full abstract

Bandit Algorithm Research Articles

Related Topics

Articles published on Bandit Algorithm

Adaptive KL-UCB Based Bandit Algorithms for Markovian and I.I.D. Settings

Assessing the robustness of Multi-Armed Bandit algorithms against biased initialization

Responsible Bandit Learning via Privacy-Protected Mean-Volatility Utility

Hierarchize Pareto Dominance in Multi-Objective Stochastic Linear Bandits

Forced Exploration in Bandit Problems

Stealthy Adversarial Attacks on Stochastic Multi-Armed Bandits

Combinatorial Stochastic-Greedy Bandit

Adaptive Anytime Multi-Agent Path Finding Using Bandit-Based Large Neighborhood Search

Robustly Improving Bandit Algorithms with Confounded and Selection Biased Offline Data: A Causal Approach

Mixed-Effects Contextual Bandits

Intelligent Caching for Vehicular Dew Computing in Poor Network Connectivity Environments

A Fairness-Enhanced Federated Learning Scheduling Mechanism for UAV-Assisted Emergency Communication.

Improvement of the recommendation system based on the multi-armed bandit algorithm

Survey of dynamic pricing based on Multi-Armed Bandit algorithms

Investigation of progress and application related to Multi-Armed Bandit algorithms

An investigation of progress related to stochastic stationary bandit algorithms

Investigation of selection and application of Multi-Armed Bandit algorithms in recommendation system

Exploring Multi-Armed Bandit algorithms: Performance analysis in dynamic environments

The investigation related to the influence of dimension manipulation on regret performance based on the upper confidence bound algorithm

Investigation of frontier Multi-Armed Bandit algorithms and applications

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Bandit Algorithm Research Articles

Related Topics

Articles published on Bandit Algorithm

Adaptive KL-UCB Based Bandit Algorithms for Markovian and I.I.D. Settings

Assessing the robustness of Multi-Armed Bandit algorithms against biased initialization

Responsible Bandit Learning via Privacy-Protected Mean-Volatility Utility

Hierarchize Pareto Dominance in Multi-Objective Stochastic Linear Bandits

Forced Exploration in Bandit Problems

Stealthy Adversarial Attacks on Stochastic Multi-Armed Bandits

Combinatorial Stochastic-Greedy Bandit

Adaptive Anytime Multi-Agent Path Finding Using Bandit-Based Large Neighborhood Search

Robustly Improving Bandit Algorithms with Confounded and Selection Biased Offline Data: A Causal Approach

Mixed-Effects Contextual Bandits

Intelligent Caching for Vehicular Dew Computing in Poor Network Connectivity Environments

A Fairness-Enhanced Federated Learning Scheduling Mechanism for UAV-Assisted Emergency Communication.

Improvement of the recommendation system based on the multi-armed bandit algorithm

Survey of dynamic pricing based on Multi-Armed Bandit algorithms

Investigation of progress and application related to Multi-Armed Bandit algorithms

An investigation of progress related to stochastic stationary bandit algorithms

Investigation of selection and application of Multi-Armed Bandit algorithms in recommendation system

Exploring Multi-Armed Bandit algorithms: Performance analysis in dynamic environments

The investigation related to the influence of dimension manipulation on regret performance based on the upper confidence bound algorithm

Investigation of frontier Multi-Armed Bandit algorithms and applications