Fast Efficient Clustering Algorithm for Balanced Data

Adel A,Rasha M,Ahmed I,M H

doi:10.14569/ijacsa.2014.050619

Abstract

The Cluster analysis is a major technique for statistical analysis, machine learning, pattern recognition, data mining, image analysis and bioinformatics. K-means algorithm is one of the most important clustering algorithms. However, the k-means algorithm needs a large amount of computational time for handling large data sets. In this paper, we developed more efficient clustering algorithm to overcome this deficiency named Fast Balanced k-means (FBK-means). This algorithm is not only yields the best clustering results as in the k-means algorithm but also requires less computational time. The algorithm is working well in the case of balanced data.

Highlights

The problem of clustering is perhaps one of the most widely studied in the data mining and machine learning communities
In k-means algorithm, a cluster is represented by the mean value of data points within a cluster and the clustering is done by minimizing the sum of distances between data points and the corresponding cluster centers
The genetic clustering algorithm (GA) parameters that have been used in the experimental: the population size = 10, selection is roulette, crossover is single point crossover, the probability of crossover

Summary

INTRODUCTION

The problem of clustering is perhaps one of the most widely studied in the data mining and machine learning communities. The kmeans clustering algorithm [7] is one of the most efficient clustering algorithms for large-scale spherical data sets. It has extensive applications in such domains as financial fraud, medical diagnosis, image processing, information retrieval, and bioinformatics [8]. The k-means algorithm and its approaches are known to be fast algorithms for solving such problems They are sensitive to the choice of starting points and can only be applied to small datasets [10]. The multi restarting k-means algorithm becomes very time consuming and inefficient for solving clustering problems, even in moderately large datasets [11]. A new clustering algorithm is proposed for clustering large data sets called FBK-means.

K-means algorithm

EXPERIMENTAL RESULTS

SUMMARY OF THE DATASETS

CONCLUSION