Matrix factorization (MF) and its extended methodologies have been studied extensively in the community of recommender systems in the last decade. Essentially, MF attempts to search for low-ranked matrices that can (1) best approximate the known rating scores, and (2) maintain low Frobenius norm for the low-ranked matrices to prevent overfitting. Since the two objectives conflict with each other, the common practice is to assign the relative importance weights as the hyper-parameters to these objectives. The two low-ranked matrices returned by MF are often interpreted as the latent factors of a user and the latent factors of an item that would affect the rating of the user on the item. As a result, it is typical that, in the loss function, we assign a regularization weight λ p on the norms of the latent factors for all users, and another regularization weight λ q on the norms of the latent factors for all the items. We argue that such a methodology probably over-simplifies the scenario. Alternatively, we probably should assign lower constraints to the latent factors associated with the items or users that reveal more information, and set higher constraints to the others. In this article, we systematically study this topic. We found that such a simple technique can improve the prediction results of the MF-based approaches based on several public datasets. Specifically, we applied the proposed methodology on three baseline models -- SVD, SVD++, and the NMF models. We found that this technique improves the prediction accuracy for all these baseline models. Perhaps more importantly, this technique better predicts the ratings on the long-tail items, i.e., the items that were rated/viewed/purchased by few users. This suggests that this approach may partially remedy the cold-start issue. The proposed method is very general and can be easily applied on various recommendation models, such as Factorization Machines, Field-aware Factorization Machines, Factorizing Personalized Markov Chains, Prod2Vec, Behavior2Vec, and so on. We release the code for reproducibility. We implemented a Python package that integrates the proposed regularization technique with the SVD, SVD++, and the NMF model. The package can be accessed at https://github.com/ncu-dart/rdf.
Read full abstract