Abstract

Personalized recommender systems play a crucial role in various online content-sharing platforms (e.g., TikTok). The learning of representations for multi-modal content is pivotal in current graph-based recommender systems. Existing works aim to enhance recommendation accuracy by leveraging multi-modal features (e.g., image, sound, text) as side information for items. However, this approach falls short in fully discerning users' fine-grained preferences across different modalities. To tackle this limitation, this paper introduces the Dual-view Multi-Modal contrastive learning Recommendation model (DMM-Rec). DMM-Rec employs self-supervised learning to guide the learning of user and item representations within the multi-modal context. Specifically, to capture users' preferences for different modalities, we propose specific-modal contrastive learning. Simultaneously, to capture users' cross-modal preferences, cross-modal contrastive learning is introduced to uncover interdependencies in users' preferences across modalities. The contrastive learning tasks not only adaptively explore potential relations between modalities but also address the data sparsity challenge in recommender systems. Extensive experiments conducted on three datasets and compared against ten baselines demonstrate that DMM-Rec outperforms the strongest baseline by an average of 6.81%. These results underscore the effectiveness of considering multi-modal content in improving recommender systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call