Abstract

This paper is concerned with the impact of hubness, a general problem of machine learning in high-dimensional spaces, on a real-world music recommendation system based on visualisation of a k-nearest neighbour (knn) graph. Due to a problem of measuring distances in high dimensions, hub objects are recommended over and over again while anti-hubs are nonexistent in recommendation lists, resulting in poor reachability of the music catalogue. We present mutual proximity graphs, which are an alternative to knn and mutual knn graphs, and are able to avoid hub vertices having abnormally high connectivity. We show that mutual proximity graphs yield much better graph connectivity resulting in improved reachability compared to knn graphs, mutual knn graphs and mutual knn graphs enhanced with minimum spanning trees, while simultaneously reducing the negative effects of hubness.

Highlights

  • In graph theory, Ozaki, Shimbo, Komachi, and Matsumoto (2011) have observed that knn graphs1 often produce hubs, i.e. vertices with extremely high numbers of edges to other vertices

  • We report the size of the largest strongly connected component as a percentage of the whole database, the number of additional strongly connected components (#scc) and their average size in number of vertices. φ-edge ratio (φ): for a labelled graph (G, l), a φ-edge is any edge eij for which li = lj (Ozaki et al, 2011), i.e. in our case for which the genres l of the songs corresponding to the vertices vi and vj do not match

  • In corroborating earlier results (Schnitzer et al, 2012), we have found that due to hubness, only two-thirds of the music catalogue is reachable at all, while only about a third of the songs are likely to being listened to according to the size of the largest strongly connected component

Read more

Summary

Introduction

In graph theory, Ozaki, Shimbo, Komachi, and Matsumoto (2011) have observed that knn graphs often produce hubs, i.e. vertices with extremely high numbers of edges to other vertices. While hubs appear very close to many other vertices, anti-hubs present as distant to all vertices Both phenomena arise from the concentration of distance measures (Francois, Wertz, & Verleysen, 2007) in highdimensional spaces. This hubness phenomenon has been shown to negatively impact a real-world music recommendation system which has been built by our research team. This system uses visualisation of a knn graph to recommend music via a web interface. We have already applied ‘mutual proximity’ (Schnitzer et al, 2012), a hubness reduction method, to improve this situation, but have not yet explored this topic in graph theoretical terms

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call