Benchmarking big data recommendation algorithms using Hadoop orApache Spark

Dinesh Kumar Saini,Arshad Muhammad,Kashif Zia

doi:10.1049/pbpc035f_ch3

Abstract

Recommender or recommendation systems have gained popularity in recent years, and big data is the driving force behind recommendation systems. Recommendation systems changed the way websites communicate with the users by providing a recommendation based on users history such as purchases and searches. Recommendation systems are used in a variety of areas such as movies, music, research articles and social tags. For example, recommendation system in Facebook “People you may know,” Netflix “Because you watched” and YouTube “Recommend for you.” These systems usually produce a list of recommendations in two ways: collaborative and content-based (CB) filtering. Collaborative filtering (CF) is based on a model of prior user behavior, which can be constructed from sole user's action or from the actions of other users who have similar behaviors, while content-based filtering constructs a recommendation on user's behavior such as by using historical browsing information. Apart from these, the hybrid approach can be used by combining two models. While designing, such systems require compute function values at several thousand points and thus are computationally quite extensive. These systems need parallel computations to speed up the search for an acceptable solution that can be recommended through nature-inspired computation. There are many factors that are essential while designing accurate recommendation algorithms. Some of these factors are diversity, recommender persistence, privacy, user demographics, trust and labeling. Recommendation system cannot perform its job without data, and big data supplies the amount of user's data such as past purchase history, browsing history [1,2]. In fact, efficient recommendation system requires big data. The best solution is Hadoop; it is a platform used to store, generate, manage and distribute big data easily around several large server nodes [3-5]. Hadoop offers Hadoop distributed file system (HDFS), which distributes all the data in different clusters and performs parallel operations. This chapter will explore big data issues and specific in Hadoop and HDFS.

Full Text