Abstract

The Cluster analysis is a major technique for statistical analysis, machine learning, pattern recognition, data mining, image analysis and bioinformatics. K-means algorithm is one of the most important clustering algorithms. However, the k-means algorithm needs a large amount of computational time for handling large data sets. In this paper, we developed more efficient clustering algorithm to overcome this deficiency named Fast Balanced k-means (FBK-means). This algorithm is not only yields the best clustering results as in the k-means algorithm but also requires less computational time. The algorithm is working well in the case of balanced data.

Highlights

  • The problem of clustering is perhaps one of the most widely studied in the data mining and machine learning communities

  • In k-means algorithm, a cluster is represented by the mean value of data points within a cluster and the clustering is done by minimizing the sum of distances between data points and the corresponding cluster centers

  • The genetic clustering algorithm (GA) parameters that have been used in the experimental: the population size = 10, selection is roulette, crossover is single point crossover, the probability of crossover

Read more

Summary

INTRODUCTION

The problem of clustering is perhaps one of the most widely studied in the data mining and machine learning communities. The kmeans clustering algorithm [7] is one of the most efficient clustering algorithms for large-scale spherical data sets. It has extensive applications in such domains as financial fraud, medical diagnosis, image processing, information retrieval, and bioinformatics [8]. The k-means algorithm and its approaches are known to be fast algorithms for solving such problems They are sensitive to the choice of starting points and can only be applied to small datasets [10]. The multi restarting k-means algorithm becomes very time consuming and inefficient for solving clustering problems, even in moderately large datasets [11]. A new clustering algorithm is proposed for clustering large data sets called FBK-means.

K-means algorithm
EXPERIMENTAL RESULTS
SUMMARY OF THE DATASETS
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call