Abstract
Recently, the LFM-1b dataset has been proposed to foster research and evaluation in music retrieval and music recommender systems, Schedl (Proceedings of the ACM International Conference on Multimedia Retrieval (ICMR). New York, 2016). It contains more than one billion music listening events created by more than 120,000 users of Last.fm. Each listening event is characterized by artist, album, and track name, and further includes a timestamp. Basic demographic information and a selection of more elaborate listener-specific descriptors are included as well, for anonymized users. In this article, we reveal information about LFM-1b’s acquisition and content and we compare it to existing datasets. We furthermore provide an extensive statistical analysis of the dataset, including basic properties of the item sets, demographic coverage, distribution of listening events (e.g., over artists and users), and aspects related to music preference and consumption behavior (e.g., temporal features and mainstreaminess of listeners). Exploiting country information of users and genre tags of artists, we also create taste profiles for populations and determine similar and dissimilar countries in terms of their populations’ music preferences. Finally, we illustrate the dataset’s usage in a simple artist recommendation task, whose results are intended to serve as baseline against which more elaborate techniques can be assessed.
Highlights
In the era of social media platforms and excessive creation of user-generated content, it has never been easier to gather and process digital user traces on a large scale, and in turn exploit them to build comprehensive user profiles
For ease of access and compatibility, the metadata on artists, albums, tracks, users, and listening events are stored in simple text files, encoded in UTF-8, while the user-artist-playcount matrix is provided as sparse matrix in a Matlab file, which complies to the HDF5 format
We presented the LFM-1b dataset to support large-scale experimentation for tasks in music information retrieval and music recommender systems
Summary
In the era of social media platforms and excessive creation of user-generated content, it has never been easier to gather and process digital user traces on a large scale, and in turn exploit them to build comprehensive user profiles. The additional information and computational features about listeners (temporal profiles, novelty, and mainstreaminess) enable the creation of personalized and context-aware recommender systems Another task we contemplate is music retrieval by time or location. The dataset in its current version can be used to model music taste on the level of user groups (e.g., based on age or gender) or countries, which opens opportunities to analyze variations and evolutions in music preferences and—complemented with publicly available data on cultural or socioeconomic aspects of populations—even to predict these music preferences from such data [15] These predictions in turn can be used to remedy the cold-start problem in recommender systems. We exploit this information to rank countries according to the mainstreaminess of their populations’ genre preferences
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have