Abstract
Online context-based domains such as recommendation systems strive to promptly suggest the appropriate items to users according to the information about items and users. However, such contextual information may be not available in practical, where the only information we can utilize is users' interaction data. Furthermore, the lack of clicked records, especially for the new users, worsens the performance of the system. To address the issues, similarity measuring, one of the key techniques in collaborative filtering, as well as the online context-based multiple armed bandit mechanism, are combined. The similarity between the context of a selected item and any candidate item is calculated and weighted. An adaptive method for adjusting the weights according to the passed time from clicking is proposed. The weighted similarity is then multiplied with the action value to decide which action is optimal or the poorest. Additionally, we come up with an exploration probability equation by introducing the selected times for the poorest action and the variance of the action values, to balance the exploration and exploitation. The regret analysis is given and the upper bound of the regret is proved. Empirical studies on three benchmarks, random dataset, Yahoo!R6A, and MovieLens, demonstrate the effectiveness of the proposed method.
Highlights
Sequential decision making [1] has received much attention in recent years due to the rapid increase in the generation of the streaming data
A feasible method to deal with this problem is using user features as the context for an agent’s decision making, which is called a contextual multi-armed bandit (CMAB) model [10]
Recommending an optimal item to a specific user is an unsolved challenge. To solve such a problem, we propose a CMAB algorithm based on similarity measuring, called CMAB_SM
Summary
Sequential decision making [1] has received much attention in recent years due to the rapid increase in the generation of the streaming data. A feasible method to deal with this problem is using user features as the context for an agent’s decision making, which is called a contextual multi-armed bandit (CMAB) model [10]. CMAB_SM designs a similarity measuring method to determine the similarity between any candidate item and the selected clicked item Based on such a similarity, the candidate item with the largest new action value can be recommended to the user, even if the users’ interaction data is sparse. Our proposed method can be attributed to the third category as a result of utilizing the similarity of the context Other from these existing works, the similarity is not used for calculating the accumulative return, but only for selecting the optimal and poorest action. The policy is a mapping from items to users, so the task of taking an action is to assign an item to a user
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.