Abstract

The Multi-armed Bandit algorithm stands as a consequential tool for informed decision-making, distinct from reliance on intuitive selections, given its systematic proclivity to meticulously assess accessible alternatives with the intent of discerning the most auspicious outcome. Amid the repertoire of algorithmic variations, the Stochastic Stationary Bandit algorithm assumes a foundational and enduring role, finding versatile application across diverse domains, including but not limited to digital advertising, price optimization, and recommendation systems. With these considerations in view, the present study embarks upon a comprehensive scrutiny of this subject matter. This paper reviews on the Explore-Then-Commit algorithm, Upper Confidence Bound algorithm, and Thompson Sampling algorithm by explaining, comparing their formulation, features, and expected results. Explore-Then-Commit algorithm has distinct phase to explore all the choices uniformly. Upper Confidence Bound algorithm make decisions by calculate an upper confidence index which is an overestimate for each choice. Thompson Sampling algorithm depends on randomness to make choices. Explore-Then-Commit algorithm faces the problem of when to explore and when to stop. Upper Confidence Bound algorithm and Thompson Sampling algorithm solve this problem by avoid certain phases. Multi-armed Bandit algorithm could deal with the process of displaying items of potential interest to users in a recommendation system, the delivery of resources in resource allocation, or the way to maximize revenue in a business.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call