Abstract

Bernoulli bandits have found to mirror many practical situations in the context of reinforcement learning, and the aim is to maximize rewards through playing the machine over a set time frame. In an actual casino setting, it is often unrealistic to fix the time when playing stops, as the termination of play may be random and dependent on the outcomes of earlier lever pulls, which in turn affects the inclination of the gambler to continue playing. It is often assumed that exploration is repeated each time the game is played, and that the game tend to go on indefinitely. In practical situations, if the casino does not change their machines often, exploration need not be carried out repeatedly as this would be inefficient. Moreover, from the gamblers' point of view, they would likely to stop at some point or when certain conditions are fulfilled. Here, the bandit problem is studied in terms of stopping rules which are dependent on earlier random outcomes and on the behavior of the players. Rewards incorporating the cost of play and the size of payouts are then calculated on the conclusion of a playing episode. Here, the rewards for Bernoulli machines are placed within the context of martingales that are commonly used in gambling situations, and the fairness of the game is expressed through the parameters of the bandit machines which can be manifested as various forms of martingales. The average rewards and regrets as well as episode durations are obtained under different martingale stopping times. Exploration costs and regrets for different bandit machines are analyzed. Experimentation has also been undertaken which corroborate the theoretical results.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.