Abstract

In a data science theory, the recommended methodology is one of the most popular theories and has been deployed in many real industries. However, one of the most challenging problems these days is how to recommend items with massively streaming data. Therefore, this paper aims to do a real-time recommendation engine using the Lambda architecture. The Apache Hadoop and Apache Spark frameworks were used in this research to process the MovieLens dataset comprised 100 K and 20 M ratings from the GroupLens research. Using alternating least squares (ALS) and k-means algorithms, the top K recommendation movies and the top K trending movies for each user were shown as results. Additionally, the mean squared error (MSE) and within cluster sum of squared error (WCSS) had been computed to evaluate the performance of the ALS and k-means algorithms, sequentially. The results showed that they are acceptable since the MSE and WCSS values are low when comparing to the size of data. However, they can still be improved by tuning some parameters.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call