A density invariant approach to clustering

Manish Kashyap,Mahua Bhattacharya

doi:10.1007/s00521-015-2145-z

Abstract

Organizing data into sensible groups is called as ‘data clustering.’ It is an open research problem in various scientific fields. Neither a universal solution nor an absolute strategy for its evaluation exists in the literature. In this context, through this paper, we make following three contributions: (1) A new method for finding ‘natural groupings’ or clusters in the data set is presented. For this, a new term ‘vicinity’ is coined. Vicinity captures the idea of density together with spatial distribution of data points in feature space. This new notion has a potential to separate various type of clusters. In summary, the approach presented here is non-convex admissive (i.e., convex hulls of the clusters found can intersect which is desirable for non-convex clusters), cluster proportion and omission admissive (i.e., duplicating a cluster arbitrary number of times or deleting a cluster does not alter other cluster’s boundaries), scale covariant, consistent (shrinking within cluster distances and enlarging inter-cluster distances does not affect the clustering results) but not rich (does not generates exhaustive partitions of the data) and density invariant. (2) Strategy for automatic detection of various tunable parameters in the proposed ‘Vicinity Based Cluster Detection’ (VBCD) algorithm is presented. (3) New internal evaluation index called ‘Space-Density Index’ (SDI) for the clustered results (by any method) is also presented. Experimental results reveal that VBCD captures the idea of ‘natural groupings’ better than the existing approaches. Also, SDI evaluation scheme provides a better judgment as compared to earlier internal cluster validity indices.

Full Text