With the booming development of personalized recommendation systems, numerous algorithms have been designed to bring more bliss and engagement to users. Multi-armed bandits (MAB) models have been increasingly used in this context, as they appropriately balance exploration and exploitation. This work provides an in-depth comparative study of diverse MAB algorithms like -greedy, Upper Confidence Bound (UCB1), and Thompson Sampling in the recommendation systems literature. This paper benchmark the computational efficiency, cumulative reward, and adaptability of these models by running them at interactive speeds with simulations of real-world user interactions. The results suggest that UCB1 works well in stable environments, whereas Thompson Sampling excels in volatile settings. The paper examines the attributes of these MAB algorithms through a systematic review of recent research. The paper also discusses the utility of MAB models in areas such as online advertising, streaming services, and e-commerce. Future research will likely focus on integrating deep learning approaches with MAB to further improve recommendation systems.
Read full abstract