Abstract

Clustering is a data processing technique that is extensively used to find novel patterns in data in the field of data mining and also in classification techniques. The k-means algorithm is extensively used for clustering due to its ease and reliability. A major effect on the accuracy and performance of the k-means algorithm is by the initial choice of the cluster centroids. Minimizing Sum of Squares of the distance from the centroid of the cluster for cluster points within the cluster (SSW) and maximizing Sum of Square distance between the centroids of different clusters (SSB) are two generally used quality parameters of the clustering technique. To improve the accuracy, performance and quality parameters of the k-means algorithm, a new Hypercube Based Cluster Initialization Method, called HYBCIM, is proposed in this work. In the proposed method, collection of k equi-sized partitions of all dimensions is modeled as a hypercube. The motivation behind the proposed method is that the clusters may spread horizontally, vertically, diagonally or in arc shaped. The proposed method empirically evaluated on four popular data sets. The results show that the proposed method is superior to basic k-means. HYBCIM is applicable for clustering both discrete and continuous data. Though, HYBCIM is proposed for k-means but it can also be applied with other clustering algorithms which are based on initial cluster centroids.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call