Can Shared Nearest Neighbors Reduce Hubness in High-Dimensional Spaces?

Arthur Flexer,Dominik Schnitzer

doi:10.1109/icdmw.2013.101

Abstract

'Hubness' is a recently discovered general problem of machine learning in high dimensional data spaces. Hub objects have a small distance to an exceptionally large number of data points, and anti-hubs are far from all other data points. It is related to the concentration of distances which impairs the contrast of distances in high dimensional spaces. Computation of secondary distances inspired by shared nearest neighbor (SNN) approaches has been shown to reduce hubness and concentration and there already exists some work on direct application of SNN in the context of hubness in image recognition. This study applies SNN to a larger number of high dimensional real world data sets from diverse domains and compares it to two other secondary distance approaches (local scaling and mutual proximity). SNN is shown to reduce hubness but less than other approaches and, contrary to its competitors, it is only able to improve classification accuracy for half of the data sets.

Full Text