Abstract

The set of objects having same characteristics are organized in groups and clusters of these objects reformed known as Data Clustering. It is an unsupervised learning technique for classification of data. K-means algorithm is widely used and famous algorithm for analysis of clusters. In this algorithm, n number of data points are divided into k clusters based on some similarity measurement criterion. K-Means Algorithm has fast speed and thus is used commonly clustering algorithm. Vector quantization, cluster analysis, feature learning are some of the application of K-Means. However results generated using this algorithm are mainly dependant on choosing initial cluster centroids. The main short come of this algorithm is to provide appropriate number of clusters. Provision of number of clusters before applying the algorithm is highly impractical and requires deep knowledge of clustering field. In this project, we are going to propose an algorithm for improvement in the initializing the centroids for K-Means algorithm. We are going to work on numerical data sets along with the categorical datasets with the n dimensions. For similarity measurement we are going to consider the Manhattan distance,Dice distance and cosine distance. The result of this proposed algorithm will be compared with the original K-Means. Also the quality and complexity of the proposed algorithm will be checked with the existing algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call