Abstract
Conversational contextual bandit is one of the notable variants of contextual bandit and it is shown to have superior performance in recommendation applications. The core idea of conversational contextual bandits utilizing is conversational feedback from users to improve the speed of learning user preference. We show that in real-world applications conversational feedback can be imbalanced and such feedback causes the latest conversational contextual bandit algorithm to conduct many conversations but has a slower learning speed than the baseline algorithm without conversational feedback. How to deal with imbalanced conversational feedback? How to schedule conversations across the learning horizon? In-depth analysis of the limitations of one representative conversational contextual bandit algorithm reveals insights to design ICF-UCB ((Imbalanced Conversational Feedback Upper Confidence Bound)) algorithm, which maintains a fast learning speed under imbalanced feedbacks. ICF-UCB achieves this by adaptively eliminating conversations that may slow down the learning speed. Furthermore, ICF-UCB adaptively schedules conversations to the decision rounds where suboptimal actions may trap the decision maker. It also adaptively selects appropriate conversations to avoid such traps. This algorithm is shown to have sublinear regret. Extensive experiments on synthetic datasets and public real-world datasets (from Yelp and TripAdvisor) validate the superior performance of ICF-UCB for recommendation tasks.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.