Thompson Sampling Research Articles

Multi-Armed Bandit (MAB) algorithms are classic algorithms that address sequential decision-making under uncertainty by solving the exploration-exploitation trade-off dilemma. This study investigates the performance comparison of multiple MAB algorithms using a simulated multi-armed bandit machine with Bernoulli reward distribution as the experimental environment. This study compares the performance differences among Upper Confidence Bound (UCB), Thompson Sampling (TS) and -Greedy Thompson Sampling (-TS) in this environment, and attempts to set different parameters, namely the number of arms and the number of experimental rounds and calculate the corresponding cumulative regret of the algorithm under various conditions. The size of the cumulative regret reflects the performance of each algorithm in the simulated slot machine model with rewards that conform to the Bernoulli distribution. In addition, the algorithm running time under the same conditions is also recorded to analyze from the perspective of algorithm efficiency. The experimental results show that under the experimental environment of this study, the cumulative regret produced by the UCB algorithm is more than three times that of the other two algorithms. When the number of trials is small, the cumulative regret generated by the TS algorithm is small, but overall, the performance of the TS algorithm and the -TS algorithm set in this experiment in minimizing the cumulative regret is not much different. However, TS runs in a shorter time under the same conditions. The results of this experiment show that after the number of rounds of experimental operation reaches a large enough number, the operating efficiency of the TS algorithm will be significantly higher than that of the -TS algorithm. TS algorithms have higher randomness, so they show better performance under this experimental condition. The -TS under the parameter setting of this study encourages exploration more in the early stage of the experiment, so it will produce greater cumulative regret than the traditional TS algorithm. In the long run, the performance difference between the two in the multi-armed bandit problem with Bernoulli distribution of rewards is very small. However, the TS algorithm has more advantages in algorithm execution efficiency, so when solving similar problems, the TS algorithm is a better choice.

Read full abstract

In today's rapidly evolving online environment, advertising recommendation systems utilize multi-armed bandit algorithms like dynamic collaborative filtering Thompson sampling (DCTS), upper confidence bound based on recommender system (UCB-RS), and dynamic -greedy algorithm (DEG) to optimize ad displays and enhance click-through rates (CTR). These algorithms must adapt to limited information and update strategies based on immediate feedback.This study employs an experimental comparison to assess the performance of the DCTS, UCB-RS, and DEG algorithms using the click-through rate prediction database from Kaggle. Five experimental sets under varied parameter settings were analyzed, employing the Receiver Operating Characteristic (ROC) curve, accuracy, and area under the curve (AUC) metrics.Results show that the DEG algorithm consistently outperforms the others, achieving higher AUC values and demonstrating robust sample identification capabilities. DEG also exhibits superior precision at high recall levels, showcasing its potential in dynamic advertising environments. Its dynamic adjustment strategy effectively balances exploration and exploitation, optimizing ad displays.The findings suggest that DEG's adaptability and stability make it particularly suitable for dynamic ad recommendation scenarios. Future research should focus on optimizing DEG's parameter settings and possibly integrating UCB-RS's exploration mechanisms to enhance performance and develop more effective strategies for advertising recommendation systems.

Read full abstract

Thompson Sampling Research Articles

Related Topics

Articles published on Thompson Sampling

Exploring the efficacy of Multi-Armed Bandit Algorithms in dynamic decision-making

Hierarchical Federated Deep Reinforcement Learning based Joint Communication and Computation for UAV Situation Awareness

Performance Comparison of UCB, TS, and -Greedy TS Algorithms through Simulation of Multi-Armed Bandit Machine

The Investigation of Progress and Application in the Multi-Armed Bandit Algorithm

Analysis of the Effectiveness of Multi-Armed Bandit Algorithms in Crop Pricing

Thompson sampling for multi-armed bandits in big data environments

UCB and Thompson Sampling Algorithms in Long-Term Investing

A Strategy for Advertisement Placement based on the Multi-Armed Tiger Problem

Analysis of the Application of Artificial Intelligence in Mahjong

Learning from different perspectives for regret reduction in reinforcement learning: A free energy approach

Strategy Selection Using Multi-Armed Bandit Algorithms in Financial Markets

Comparison of Multi-Armed Bandit Algorithms in Advertising Recommendation Systems

Epsilon-Greedy Thompson Sampling to Bayesian Optimization

Harnessing Multi-Armed Bandits for Smarter Digital Marketing Decisions

TS-based Mobility-aware Multi-hierarchical Caching Model with Vehicle Clustering and Content Popularity Prediction

A Survey on Variants of Thompson Sampling

Simple fixes that accommodate switching costs in multi-armed bandits

Human-in-the-Loop Trajectory Optimization Based on sEMG Biofeedback for Lower-Limb Exoskeleton.

Learning-based dynamic pricing strategy with pay-per-chapter mode for online publisher with case study of COL

Mating with Multi-Armed Bandits: Reinforcement Learning Models of Human Mate Search.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Thompson Sampling Research Articles

Related Topics

Articles published on Thompson Sampling

Exploring the efficacy of Multi-Armed Bandit Algorithms in dynamic decision-making

Hierarchical Federated Deep Reinforcement Learning based Joint Communication and Computation for UAV Situation Awareness

Performance Comparison of UCB, TS, and -Greedy TS Algorithms through Simulation of Multi-Armed Bandit Machine

The Investigation of Progress and Application in the Multi-Armed Bandit Algorithm

Analysis of the Effectiveness of Multi-Armed Bandit Algorithms in Crop Pricing

Thompson sampling for multi-armed bandits in big data environments

UCB and Thompson Sampling Algorithms in Long-Term Investing

A Strategy for Advertisement Placement based on the Multi-Armed Tiger Problem

Analysis of the Application of Artificial Intelligence in Mahjong

Learning from different perspectives for regret reduction in reinforcement learning: A free energy approach

Strategy Selection Using Multi-Armed Bandit Algorithms in Financial Markets

Comparison of Multi-Armed Bandit Algorithms in Advertising Recommendation Systems

Epsilon-Greedy Thompson Sampling to Bayesian Optimization

Harnessing Multi-Armed Bandits for Smarter Digital Marketing Decisions

TS-based Mobility-aware Multi-hierarchical Caching Model with Vehicle Clustering and Content Popularity Prediction

A Survey on Variants of Thompson Sampling

Simple fixes that accommodate switching costs in multi-armed bandits

Human-in-the-Loop Trajectory Optimization Based on sEMG Biofeedback for Lower-Limb Exoskeleton.

Learning-based dynamic pricing strategy with pay-per-chapter mode for online publisher with case study of COL

Mating with Multi-Armed Bandits: Reinforcement Learning Models of Human Mate Search.