An effective distributed predictive model with Matrix factorization and random forest for Big Data recommendation systems

Badr Ait Hammou,Ayoub Ait Lahcen,Salma Mouline

doi:10.1016/j.eswa.2019.06.046

Abstract

Recommendation systems have been widely deployed to address the challenge of overwhelming information. They are used to enable users to find interesting information from a large volume of data. However, in the era of Big Data, as data become larger and more complicated, a recommendation algorithm that runs in a traditional environment cannot be fast and effective. It requires a high computational cost for performing the training task, which may limit its applicability in real-world Big Data applications.In this paper, we propose a novel distributed recommendation solution for Big Data. It is designed based on Apache Spark to handle large-scale data, improve the prediction quality, and address the data sparsity problem. In particular, thanks to a novel learning process, the model is able to significantly speed up the distributed training, as well as improve the performance in the context of Big Data. Experimental results on three real-world data sets demonstrate that our proposal outperforms existing recommendation methods in terms of Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and computational time.

Full Text