Abstract
In recent years, the advertising sector has increasingly embraced Multi-armed Bandit Algorithms (MAB) for their versatile applications. This article delves into four stochastic MAB algorithms: the Explore-Then-Commit (ETC), the Upper Confidence Bound (UCB), the Thompson-Sampling (TS), and their respective variants, applied in an online shopping platform's advertising campaign to optimize click-through rates. Our findings indicate that each of these algorithms successfully identifies the most effective advertisement. Both the UCB and TS algorithms demonstrate logarithmic regret relative to the time horizon. The ETC algorithm exhibits a delayed onset of logarithmic regret but surprisingly holds up well against the UCB algorithms. Notably, the TS algorithm class outshines the UCB class due to its inherent randomness, offering a more robust performance. Furthermore, two variants, the UCB algorithm with the Mini-max Optimal Strategy in the Stochastic case (MOSS) and the paralleled TS algorithm, show promising results. They not only maintain logarithmic regret but also enhance efficiency, indicating further potential in the UCB and TS algorithm classes for advanced applications.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.