Combining content with user preferences for TED lecture recommendation

Nikolaos Pappas,Andrei Popescu-Belis

doi:10.1109/cbmi.2013.6576551

Abstract

This paper introduces a new dataset and compares several methods for the recommendation of non-fiction audiovisual material, namely lectures from the TED website. The TED dataset contains 1,149 talks and 69,023 profiles of users, who have made more than 100,000 ratings and 200,000 comments. This data set, which we make public, can be used for training and testing of generic and personalized recommendation tasks. We define content-based, collaborative, and combined recommendation methods for TED lectures and use cross-validation to select the best parameters of keyword-based (TFIDF) and semantic vector space-based methods (LSI, LDA, RP, and ESA). We compare these methods on a personalized recommendation task in two settings, a cold-start and a non-cold-start one. In the former, semantic-based vector spaces perform better than keyword-based ones. In the latter, where collaborative information can be exploited, content-based methods are outperformed by collaborative filtering ones, but the proposed combined method shows acceptable performances, and can be used in both settings.

Full Text