An extended study of the K-means algorithm for data clustering and its applications

Ja-Shen Chen,Russell K H Ching,Yi-Shen Lin

doi:10.1057/palgrave.jors.2601732

Ja-Shen Chen, Russell K H Ching + Show 1 more

https://doi.org/10.1057/palgrave.jors.2601732

Copy DOI

Abstract

The K-means algorithm has been a widely applied clustering technique, especially in the area of marketing research. In spite of its popularity and ability to deal with large volumes of data quickly and efficiently, K-means has its drawbacks, such as its inability to provide good solution quality and robustness. In this paper, an extended study of the K-means algorithm is carried out. We propose a new clustering algorithm that integrates the concepts of hierarchical approaches and the K-means algorithm to yield improved performance in terms of solution quality and robustness. This proposed algorithm and score function are introduced and thoroughly discussed. Comparison studies with the K-means algorithm and three popular K-means initialization methods using five well-known test data sets are also presented. Finally, a business application involving segmenting credit card users demonstrates the algorithm's capability.

Full Text