Simultaneous incremental matrix factorization for streaming recommender systems

Martin Jakomin,Zoran Bosnić,Tomaž Curk

doi:10.1016/j.eswa.2020.113685

Abstract

Recommender systems are large-scale machine learning and knowledge discovery tools aimed at providing personalized recommendations to customers based on their preferences and needs. They need to handle large quantities of diverse and very sparse data in a matter of seconds. Matrix factorization techniques have proven to be useful and reliable for implementing recommender systems, while data sparsity problem can be indirectly alleviated by considering multiple heterogeneous data sources. Furthermore, utilization of data fusion can resolve in a higher predictive accuracy. For real-world applications, e.g., such with continuous user feedback, incrementally handling recommender systems upon multiple data streams remains a crucial and only partially solved problem. This paper presents one way of fusing multiple data streams through matrix factorization. Our proposed method (SIMF) models heterogeneous and asynchronous data streams and provides predictions in real time. As a result of incremental updating, the proposed method successfully adapts to changes in data concepts, while application of data fusion improves prediction accuracy and reduces effects of the cold-start problem. Using the proposed methodology, we have develop a streaming algorithm and show how prediction accuracy can be substantially increased by considering multiple data sources, while at the same time the negative effects of the cold-start can be greatly diminished. Evaluations on a large-scale real-life problem (Yelp recommendations) confirm these claims as we present a highly scalable streaming recommender system that adapts to new concepts in data and provides accurate predictions (compared to the other matrix factorization techniques) in a very sparse problem domain. Apart from a recommender system proposed in this work, the versatility of matrix factorization could further allow the presented methodology for adaptation to solve several other machine learning problems, such as dimensionality reduction, clustering and classification.

Full Text