• We propose two multi-modal datasets for recommendation. • We propose a novel recommendation system called the MVABPR . • We design a set of EMK features to boost recommendation performance. • We use cross-modal semantics and adversarial learning to promote performance. • The implicit feeling tone of a recommended item can be captured accurately. Recommendation system is facing the “data sparseness” issue. Additional information, including images, texts, and videos, contributes to alleviating this issue. We propose a new multi-modal visual adversarial Bayesian personalized ranking (MVABPR) model to address the issue. The proposed model takes new features, cross-modal semantics, adversarial learning, and visual interface into account. Two multi-modal datasets are created based on the MovieLens datasets and the correlated images. Besides the shape, texture, color, and deep learning-based features, a set of efficient match kernel features are proposed. More discriminative but low-dimensional cross-modal semantics among these features is mined to characterize each item effectively, which is absorbed into the MVABPR model through a visual interface. A new adversarial learning strategy is employed to optimize the whole training procedure. This makes the MVABPR model more robust and stable. Experimental results demonstrate that the MVABPR model is effective and robust for recommendation. It outperforms other competitive baselines. As another advantage, it can learn visual information and users’ rating jointly, effectively, combined with adversarial learning. And the implicit feeling tone of a recommended item can be accurately captured by the proposed model. More importantly, the model achieves better performance on a large-scale sparser dataset, demonstrating its higher practicality.
Read full abstract