Heterogeneous Information Assisted Bandit Learning: Theory and Application

Xiaoying Zhang,John C.S Lui,Hong Xie

doi:10.1109/icde51399.2021.00213

Abstract

Contextual bandit serves as an invaluable tool to balance the exploration vs. exploitation trade-off in various applications like online recommendation. In many applications, heterogeneous information network (HIN) can be derived to provide rich side information for contextual bandits, such as different types of attributes and relationships among users and items. In this paper, we propose the first HIN-assisted contextual bandit framework, which utilizes a given HIN to assist contextual bandit learning. The proposed framework uses meta-paths in HIN to extract rich relations among users and items for the contextual bandit. The main challenge is how to leverage these relations, since users’ preference over items, the target of our online learning, are closely related to users’ preference over meta-paths, however it is unknown which meta-path a user prefers more. We propose the HUCB algorithm to address such a challenge. For each meta-path, the HUCB algorithm employs an independent base bandit algorithm to handle online item recommendation by leveraging the relationship captured in this meta-path. The bandit master is then employed to learn users’ preference over meta-paths to dynamically combine base bandit algorithms with a balance of exploration-exploitation trade-off. Experimental results on real datasets from LastFM and Yelp demonstrate the efficacy of the HUCB algorithm.

Full Text