Range clustering: An algorithm for empirical evaluation of classical clustering algorithms

Nishant Arora,Sandeep Jain,Santosh Kumar Verma

doi:10.1109/ic3.2016.7880242

Abstract

Cluster analysis is a principal method in analytics domain of data mining. The algorithm used for clustering directly influences the results obtained from applying the clustering algorithm (clusters). Data clustering is done in order to identify the patterns and trends not identifiable from just looking at the data. Clustering may be supervised (if the machine training data set is available) or unsupervised (if the machine training data set is not available). Unsupervised clustering is usually done using k-Means Algorithm (using any distance, the most common being Euclidean and Manhattan Distance). The drawback of k-means algorithm for a large set are the rigorous calculations that need to be done to cluster a data set into multiple data subsets for every single iteration, thereby limiting its efficiency and use for large data sets. We propose a range based single pass clustering algorithm that clusters data on the basis of the range which it falls in, where the ranges are calculated using simple arithmetic mean between two values. The proposed algorithm is compared against the standard k-means algorithm (using Euclidean Distance and Manhattan Distance).

Full Text