Thompson Sampling Algorithm Research Articles

Multi-Armed Bandit (MAB) algorithms are classic algorithms that address sequential decision-making under uncertainty by solving the exploration-exploitation trade-off dilemma. This study investigates the performance comparison of multiple MAB algorithms using a simulated multi-armed bandit machine with Bernoulli reward distribution as the experimental environment. This study compares the performance differences among Upper Confidence Bound (UCB), Thompson Sampling (TS) and -Greedy Thompson Sampling (-TS) in this environment, and attempts to set different parameters, namely the number of arms and the number of experimental rounds and calculate the corresponding cumulative regret of the algorithm under various conditions. The size of the cumulative regret reflects the performance of each algorithm in the simulated slot machine model with rewards that conform to the Bernoulli distribution. In addition, the algorithm running time under the same conditions is also recorded to analyze from the perspective of algorithm efficiency. The experimental results show that under the experimental environment of this study, the cumulative regret produced by the UCB algorithm is more than three times that of the other two algorithms. When the number of trials is small, the cumulative regret generated by the TS algorithm is small, but overall, the performance of the TS algorithm and the -TS algorithm set in this experiment in minimizing the cumulative regret is not much different. However, TS runs in a shorter time under the same conditions. The results of this experiment show that after the number of rounds of experimental operation reaches a large enough number, the operating efficiency of the TS algorithm will be significantly higher than that of the -TS algorithm. TS algorithms have higher randomness, so they show better performance under this experimental condition. The -TS under the parameter setting of this study encourages exploration more in the early stage of the experiment, so it will produce greater cumulative regret than the traditional TS algorithm. In the long run, the performance difference between the two in the multi-armed bandit problem with Bernoulli distribution of rewards is very small. However, the TS algorithm has more advantages in algorithm execution efficiency, so when solving similar problems, the TS algorithm is a better choice.

Read full abstract

In the contemporary Internet recommendation systems in various fields, muti-armed algorithms will show high accuracy and practicability. Internet users can always get abundant positive feedback through the recommendation system via utilizing these algorithms. In this paper, there are three most typical muti-armed algorithms as examples are provided. Explore-Then-Commit (ETC) algorithm is the first to mentioned. The physical meaning of this algorithm is that in the exploration stage, the action will be selected in a certain order, and after a certain number of rounds, the action with the largest average reward will be directly selected. Moreover, Upper Confidence Bond (UCB) algorithm is also a type of pivotal tool. The main function of UCB algorithm is to make the selection by calculating the upper bound of each arm confidence interval instead of the expected reward of the slot machine. Thus, it is an optimistic algorithm. First, select each arm in random order, then calculate the value of the upper bound of each arm confidence interval, and finally, select the arm with the largest value. The last mentioned Thompson Sampling (TS) Algorithm take Bayesian optimization as the theoretical basis. This algorithm takes out the candidate parameters, generates a random number, and selects the maximum value for input. This paper also introduced two fields related to the topic of the application of muti-armed algorithms in the recommendation systems in modern fields. News recommendation algorithms and personalized recommendation systems can be more comprehensive and representative to illustrate the practicality of these algorithms. Therefore, muti-armed algorithms reflect the importance in the recommendation fields via the balance of exploration and exploitation.

Read full abstract

Thompson Sampling Algorithm Research Articles

Related Topics

Articles published on Thompson Sampling Algorithm

Performance Comparison of UCB, TS, and -Greedy TS Algorithms through Simulation of Multi-Armed Bandit Machine

A Strategy for Advertisement Placement based on the Multi-Armed Tiger Problem

Analysis of the Effectiveness of Multi-Armed Bandit Algorithms in Crop Pricing

Mating with Multi-Armed Bandits: Reinforcement Learning Models of Human Mate Search.

Optimizing Commercial Site Selection Using the Thompson Sampling Algorithm

Thompson Sampling for Stochastic Bandits with Noisy Contexts: An Information-Theoretic Regret Analysis.

Optimizing video click-through rates with bandit algorithms

Analyzing the strengths and weaknesses of diverse algorithms for solving Multi-Armed Bandit problems using Python

Ad Optimization Via Machine Learning: A Focus on Upper Confidence Bound and Thompson Sampling Algorithms

Utilizing Multi-Armed Bandit Algorithms for Advertising: An In-Depth Case Study on an Online Retail Platform's Advertising Campaign

Personalized Dynamic Pricing Based on Improved Thompson Sampling

Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse Hypergraphs

Investigation of progress and application related to Multi-Armed Bandit algorithms

An investigation of progress related to stochastic stationary bandit algorithms

The investigation and application of Muti-Armed algorithms in recommendation systems

Optimization strategy for the pull arm function in the Multi-Armed Bandit algorithm

Multiarmed Bandit Algorithms on Zynq System-on-Chip: Go Frequentist or Bayesian?

Multi-armed bandits for performance marketing

Application of multi-armed bandits to dose-finding clinical designs

Model-Based Thompson Sampling for Frequency and Rate Selection in Underwater Acoustic Communications

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Thompson Sampling Algorithm Research Articles

Related Topics

Articles published on Thompson Sampling Algorithm

Performance Comparison of UCB, TS, and -Greedy TS Algorithms through Simulation of Multi-Armed Bandit Machine

A Strategy for Advertisement Placement based on the Multi-Armed Tiger Problem

Analysis of the Effectiveness of Multi-Armed Bandit Algorithms in Crop Pricing

Mating with Multi-Armed Bandits: Reinforcement Learning Models of Human Mate Search.

Optimizing Commercial Site Selection Using the Thompson Sampling Algorithm

Thompson Sampling for Stochastic Bandits with Noisy Contexts: An Information-Theoretic Regret Analysis.

Optimizing video click-through rates with bandit algorithms

Analyzing the strengths and weaknesses of diverse algorithms for solving Multi-Armed Bandit problems using Python

Ad Optimization Via Machine Learning: A Focus on Upper Confidence Bound and Thompson Sampling Algorithms

Utilizing Multi-Armed Bandit Algorithms for Advertising: An In-Depth Case Study on an Online Retail Platform's Advertising Campaign

Personalized Dynamic Pricing Based on Improved Thompson Sampling

Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse Hypergraphs

Investigation of progress and application related to Multi-Armed Bandit algorithms

An investigation of progress related to stochastic stationary bandit algorithms

The investigation and application of Muti-Armed algorithms in recommendation systems

Optimization strategy for the pull arm function in the Multi-Armed Bandit algorithm

Multiarmed Bandit Algorithms on Zynq System-on-Chip: Go Frequentist or Bayesian?

Multi-armed bandits for performance marketing

Application of multi-armed bandits to dose-finding clinical designs

Model-Based Thompson Sampling for Frequency and Rate Selection in Underwater Acoustic Communications