Abstract

The Multi-Armed Bandit problem is a problem in reinforcement learning that focuses on how to solve the exploration-exploitation dilemma. The dilemma is, given a set of options (known as arms) that you could try many times, how to balance between gathering information through experimenting with the available arms (exploration) or maximizing profit by choosing the seemingly best arm at that time (exploitation). The Multi-Armed Bandit problem is centered around determining which arm to choose at every round. Multi-Armed Bandit has gained its popularity as a more dynamic approach to a randomized trial, with its goal is to experiment with each available arm while still maximizing profit gained. An example of Multi-Armed Bandit in real life is in determining which film artwork should be shown to a visitor that would attract the visitor to watch that particular film. Bernoulli distribution with parameter θ is chosen to model the response of the visitor after seeing the artwork. Non-stationary condition on θ can be implemented to accommodate various trends in film artworks. Some artworks might be good at a certain month, but they could not be preferred in the next month. The non-stationary condition in this study is modeled through piecewise-stationary. We implemented a discounted Thompson sampling policy that used Bayesian method to determine which arm to choose at each round. Multiple simulations were conducted on various conditions to empirically test the policy’s performance on various conditions. Evaluation was based on the cumulative regret. Based on these simulations, discounted Thompson sampling policy achieved relatively lower cumulative regret in tackling the stationary and piecewise-stationary conditions, compared to some well-known policies such as Epsilon Greedy, SoftMax, Upper Confidence Bound, and Thompson Sampling.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.