Multi-Interest Multi-Round Conversational Recommendation System with Fuzzy Feedback based User Simulator

Qi Shen,Zhihua Wei,Yitong Pang,Bo Long,Fangli Xu,Yiming Zhang,Jian Pei,Lingfei Wu

doi:10.1145/3616379

Abstract

Conversational recommendation system (CRS) is able to obtain fine-grained and dynamic user preferences based on interactive dialogue. Previous CRS assumes that the user has a clear target item, which often deviates from the real scenario. The user may have a clear single preference for some attribute types (e.g. brand) of items, while for other attribute types (e.g. color), the user may have multiple preferences or even no clear preferences, which leads to multiple acceptable items under multiple combinations of attribute instances. Furthermore, previous works assume that users would provide clear responses to any questions asked by the system. And they also assume that users would be dedicated to the target item, that is, user would answer ”yes” to the attribute corresponding to the target item and answer ”no” to other attributes. However, users’ responses to attributes are not completely dependent on target items, but also influenced by users’ inherent interests. Besides, for some over-specific or equivocal questions, the feedback of user might not be clear (”yes”/”no”) and user might give some fuzzy response like ”I don’t know”. To address the aforementioned issues, we first propose a more realistic conversational recommendation learning setting, namely Multi-Interest Multi-round Conversational Recommendation (MIMCR), where users may have multiple interests in attribute instance combinations and accept multiple items with partially overlapped combinations of attribute instances. To effectively cope with MIMCR, we propose a novel learning framework, namely Multiple Choice questions based Multi-Interest Policy Learning. Moreover, we further propose a more realistic User-centric User Simulator with Fuzzy Feedback (UUSFF), which naturally calibrates the user response with additional fuzzy feedback based on user‘s inherent preference. To better match the new scenario UUSFF, we propose a simple but effective adaption method for different backbones. Extensive experimental results on several datasets demonstrate the superiority of our methods for the proposed settings.

Full Text