Abstract

Due to the challenge of balancing exploration and exploitation in Multi-Armed Bandits (MAB) problems, it is rather challenging to determine the optimal exploration length for diverse datasets. This paper presents an in-depth investigation of the hyperparameter setting for the Explore-Then-Commit (ETC) algorithm, emphasizing an enforced exploration strategy to guarantee adequate exploration of each arm. This investigation will be realized in the context of movie recommendation systems, specifically utilizing the MovieLens 1M dataset. Two key hyperparameters are scrutinized: the horizon, which sets the overall timeframe of the algorithm's execution, and the number of times each arm is explored. The study systematically varies these parameters to study their influence on cumulative regret, a measure of the opportunity cost each time a non-optimal arm is sampled. The empirical examination of the ETC algorithm, conducted employing the MovieLens 1M dataset, has yielded significant discernments pertaining to the dynamic configuration of exploration lengths tailored to particular datasets. The empirical findings underscore that judicious fine-tuning of the ETC algorithm's hyperparameters, under a defined horizon of 50,000 and an exploration length of 500, engenders commendable competitive performance. This optimized configuration notably excels in the domain of genre recommendations, particularly manifesting enhanced proficiency in suggesting genres characterized by the most elevated average ratings. This research accentuates and reinforces the comprehension of hyperparameters under the exploration and exploitation dilemma, thereby providing a structured pathway for the forthcoming applicability of MAB in the wider realm of recommendation systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call