Abstract
Machine learning algorithms are often used in content-based recommender systems since a recommendation task can naturally be reduced to a classification problem: A recommender needs to learn a classifier for a given user where learning examples are characteristics of items previously liked/bought/seen by the user. However, multi-valued and continuous attributes require special approaches for classifier implementation as they can significantly influence classifier accuracy. In this paper we propose novel approaches for handling multi- valued and continuous attributes adequate for the naive Bayes classifier and decision trees classifier, and tune it for content-based movie recommendation. We evaluate the performance of the resulting approaches using the MovieLens data set enriched with movie details retrieved from the Internet Movie Database. Our empirical results demonstrate that the naive Bayes classifier is more suitable for content-based movie recommendation than the decision trees algorithm. In addition, the naive Bayes classifier achieves better results with smart discretization of continuous attributes compared to the approach which models continuous attributes with a Gaussian distribution. Finally, we combine our best performing content-based algorithm with the k-means clustering algorithm typically used for collaborative filtering, and evaluate the performance of the resulting hybrid approach for a movie recommendation task. The experimental results clearly show that the hybrid approach significantly increases recommendation accuracy compared to collaborative filtering while reducing the risk of over specification, which is a typical problem of content-based approaches.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.