Abstract

Recommendation services play a pivotal role in financial decision-making and multimedia content services, as they suggest investment operations and personalized items to users, typically characterized by multi-modal features such as visual, textual, and acoustic attributes. Graph Neural Networks (GNNs), demonstrating the immense potential for graph representation learning and recommendation systems, are capable of learning user/item embeddings by taking into account the graph topological structure and the multi-modal node features. Yet, a substantial number of multi-modal recommendation studies have seemingly ignored the inherent bias among different modalities during feature fusion, consequently leading to sub-optimal embeddings for items with multi-modal features. To mitigate this issue, we propose a novel multi-modal recommendation framework that integrates GNNs with deep mutual learning techniques, termed GNNMR. GNNMR uses the mutual knowledge distillation technique to collaboratively train multiple uni-modal bipartite user-item graphs. Each GNN is trained specifically on the uni-modal user-item bipartite graph, which is separated from the original multi-modal user-item bipartite graph, to generate uni-modal embeddings. These uni-modal embeddings then act as mutual supervision signals, allowing the model to uncover and synchronize the latent semantic relationships among different modalities. Subsequently, the model can conduct inference in an ensemble manner, leveraging uni-modal embeddings from diverse modalities. Experimental results on two real-world datasets demonstrate that the proposed GNNMR outperforms other multi-modal recommendation methods in the Top-K recommendation task.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call