Social Distance metric: from coordinates to neighborhoods

Vagan Terziyan

doi:10.1080/13658816.2017.1367796

Vagan Terziyan

Open Access

https://doi.org/10.1080/13658816.2017.1367796

Copy DOI

Abstract

ABSTRACTChoice of a distance metric is a key for the success in many machine learning and data processing tasks. The distance between two data samples traditionally depends on the values of their attributes (coordinates) in a data space. Some metrics also take into account the distribution of samples within the space (e.g. local densities) aiming to improve potential classification or clustering performance. In this paper, we suggest the Social Distance metric that can be used on top of any traditional metric. For a pair of samples x and y, it averages the two numbers: the place (rank), which sample y holds in the list of ordered nearest neighbors of x; and vice versa, the rank of x in the list of the nearest neighbors of y. Average is a contraharmonic Lehmer mean, which penalizes the difference between the numbers by giving values greater than the Arithmetic mean for the unequal arguments. We consider normalized average as a distance function and we prove it to be a metric. We present several modifications of such metric and show that their properties are useful for a variety of classification and clustering tasks in data spaces or graphs in a Geographic Information Systems context and beyond.

Full Text