Multi-armed Bandit Algorithm Research Articles

PurposeWe argue that a fundamental issue regarding how to search and how to switch between different cognitive modes lies in the decision rules that influence the dynamics of learning and exploration. We examine the search logics underlying these decision rules and propose conceptual prompts that can be applied mentally or computationally to aid managers’ decision-making.Design/methodology/approachBy applying Multi-Armed Bandit (MAB) modeling to simulate agents’ interaction with dynamic environments, we compared the patterns and performance of selected MAB algorithms under different configurations of environmental conditions.FindingsWe develop three conceptual prompts. First, the simple heuristic-based exploration strategy works well in conditions of low environmental variability and few alternatives. Second, an exploration strategy that combines simple and de-biasing heuristics is suitable for most dynamic and complex decision environments. Third, the uncertainty-based exploration strategy is more applicable in the condition of high environmental unpredictability as it can more effectively recognize deviated patterns.Research limitations/implicationsThis study contributes to emerging research on using algorithms to develop novel concepts and combining heuristics and algorithmic intelligence in strategic decision-making.Practical implicationsThis study offers insights that there are different possibilities for exploration strategies for managers to apply conceptually and that the adaptability of cognitive-distant search may be underestimated in turbulent environments.Originality/valueDrawing on insights from machine learning and cognitive psychology research, we demonstrate the fitness of different exploration strategies in different dynamic environmental configurations by comparing the different search logics that underlie the three MAB algorithms.

Read full abstract

The Multi-armed Bandit algorithm, a proficient solver of the exploration-and-exploitation trade-off predicament, furnishes businesses with a robust tool for resource allocation that predominantly aligns with customer preferences. However, varying Multi-armed Bandit algorithm types exhibit dissimilar performance characteristics based on contextual variations. Hence, a series of experiments is imperative, involving alterations to input values across distinct algorithms. Within this study, three specific algorithms were applied, Explore-then-commit (ETC), Upper Confident Bound (UCB) and its asymptotically optimal variant, and Thompson Sampling (TS), to the extensively utilized MovieLens dataset. This application aimed to gauge their effectiveness comprehensively. The algorithms were translated into executable code, and their performance was visually depicted through multiple figures. Through cumulative regret tracking within defined conditions, algorithmic performance was scrutinized, laying the groundwork for subsequent parameter-based comparisons. A dedicated experimentation framework was devised to evaluate the robustness of each algorithm, involving deliberate parameter adjustments and tailored experiments to elucidate distinct performance nuances. The ensuing graphical depictions distinctly illustrated Thompson Sampling's persistent minimal regrets across most scenarios. UCB algorithms displayed steadfast stability. ETC manifested excellent performance with a low number of runs but escalate significantly along the number of runs growing. It also warranting constraints on exploratory phases to mitigate regrets. This investigation underscores the efficacy of Multi-armed Bandit algorithms while elucidating their nuanced behaviors within diverse contextual contingencies.

Read full abstract

Multi-armed Bandit Algorithm Research Articles

Related Topics

Articles published on Multi-armed Bandit Algorithm

Comparative Evaluation, Challenges, and Diverse Applications of Multi-Armed Bandit Algorithms

In-depth Exploration and Implementation of Multi-Armed Bandit Models Across Diverse Fields

Resource Allocation for LoRaWAN Network Slicing: Multi-Armed Bandit-based Approaches

An efficient beaconing of bluetooth low energy by decision making algorithm

Assessing the robustness of Multi-Armed Bandit algorithms against biased initialization

Stealthy Adversarial Attacks on Stochastic Multi-Armed Bandits

Using Adaptive Bandit Experiments to Increase and Investigate Engagement in Mental Health.

Adaptive Anytime Multi-Agent Path Finding Using Bandit-Based Large Neighborhood Search

Forced Exploration in Bandit Problems

Managerial decision-making: exploration strategies in dynamic environments

Intelligent Caching for Vehicular Dew Computing in Poor Network Connectivity Environments

A Fairness-Enhanced Federated Learning Scheduling Mechanism for UAV-Assisted Emergency Communication.

Improvement of the recommendation system based on the multi-armed bandit algorithm

Investigation of progress and application related to Multi-Armed Bandit algorithms

Survey of dynamic pricing based on Multi-Armed Bandit algorithms

Investigation of selection and application of Multi-Armed Bandit algorithms in recommendation system

Exploring Multi-Armed Bandit algorithms: Performance analysis in dynamic environments

Investigation of frontier Multi-Armed Bandit algorithms and applications

An investigation of progress related to stochastic stationary bandit algorithms

Optimization strategy for the pull arm function in the Multi-Armed Bandit algorithm

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Multi-armed Bandit Algorithm Research Articles

Related Topics

Articles published on Multi-armed Bandit Algorithm

Comparative Evaluation, Challenges, and Diverse Applications of Multi-Armed Bandit Algorithms

In-depth Exploration and Implementation of Multi-Armed Bandit Models Across Diverse Fields

Resource Allocation for LoRaWAN Network Slicing: Multi-Armed Bandit-based Approaches

An efficient beaconing of bluetooth low energy by decision making algorithm

Assessing the robustness of Multi-Armed Bandit algorithms against biased initialization

Stealthy Adversarial Attacks on Stochastic Multi-Armed Bandits

Using Adaptive Bandit Experiments to Increase and Investigate Engagement in Mental Health.

Adaptive Anytime Multi-Agent Path Finding Using Bandit-Based Large Neighborhood Search

Forced Exploration in Bandit Problems

Managerial decision-making: exploration strategies in dynamic environments

Intelligent Caching for Vehicular Dew Computing in Poor Network Connectivity Environments

A Fairness-Enhanced Federated Learning Scheduling Mechanism for UAV-Assisted Emergency Communication.

Improvement of the recommendation system based on the multi-armed bandit algorithm

Investigation of progress and application related to Multi-Armed Bandit algorithms

Survey of dynamic pricing based on Multi-Armed Bandit algorithms

Investigation of selection and application of Multi-Armed Bandit algorithms in recommendation system

Exploring Multi-Armed Bandit algorithms: Performance analysis in dynamic environments

Investigation of frontier Multi-Armed Bandit algorithms and applications

An investigation of progress related to stochastic stationary bandit algorithms

Optimization strategy for the pull arm function in the Multi-Armed Bandit algorithm