Abstract

Recently, the LFM-1b dataset has been proposed to foster research and evaluation in music retrieval and music recommender systems, Schedl (Proceedings of the ACM International Conference on Multimedia Retrieval (ICMR). New York, 2016). It contains more than one billion music listening events created by more than 120,000 users of Last.fm. Each listening event is characterized by artist, album, and track name, and further includes a timestamp. Basic demographic information and a selection of more elaborate listener-specific descriptors are included as well, for anonymized users. In this article, we reveal information about LFM-1b’s acquisition and content and we compare it to existing datasets. We furthermore provide an extensive statistical analysis of the dataset, including basic properties of the item sets, demographic coverage, distribution of listening events (e.g., over artists and users), and aspects related to music preference and consumption behavior (e.g., temporal features and mainstreaminess of listeners). Exploiting country information of users and genre tags of artists, we also create taste profiles for populations and determine similar and dissimilar countries in terms of their populations’ music preferences. Finally, we illustrate the dataset’s usage in a simple artist recommendation task, whose results are intended to serve as baseline against which more elaborate techniques can be assessed.

Highlights

  • In the era of social media platforms and excessive creation of user-generated content, it has never been easier to gather and process digital user traces on a large scale, and in turn exploit them to build comprehensive user profiles

  • For ease of access and compatibility, the metadata on artists, albums, tracks, users, and listening events are stored in simple text files, encoded in UTF-8, while the user-artist-playcount matrix is provided as sparse matrix in a Matlab file, which complies to the HDF5 format

  • We presented the LFM-1b dataset to support large-scale experimentation for tasks in music information retrieval and music recommender systems

Read more

Summary

Introduction

In the era of social media platforms and excessive creation of user-generated content, it has never been easier to gather and process digital user traces on a large scale, and in turn exploit them to build comprehensive user profiles. The additional information and computational features about listeners (temporal profiles, novelty, and mainstreaminess) enable the creation of personalized and context-aware recommender systems Another task we contemplate is music retrieval by time or location. The dataset in its current version can be used to model music taste on the level of user groups (e.g., based on age or gender) or countries, which opens opportunities to analyze variations and evolutions in music preferences and—complemented with publicly available data on cultural or socioeconomic aspects of populations—even to predict these music preferences from such data [15] These predictions in turn can be used to remedy the cold-start problem in recommender systems. We exploit this information to rank countries according to the mainstreaminess of their populations’ genre preferences

Related datasets
Description of the LFM-1b dataset
Data acquisition
Dataset availability and content
Dataset statistics
Demographics
Listening events
Descriptors of preference and consumption behavior
Sample source code
Analysis of country-specific music preferences
Coarse genre profiles
Fine-grained genre profiles
Country similarity according to music preferences
Experiments with algorithms for music recommendation
Collaborative filtering
Demographic filtering
Experiments and results
Hybrid recommender
Random baselines
Findings
Conclusions and future work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call