K-normal: An Improved K-means for Dealing with Clusters of Different Sizes

Yonggang Lu,Xiaochun Wang,Jiangang Qiao

doi:10.1007/978-3-319-63315-2_29

Yonggang Lu, Xiaochun Wang + Show 1 more

https://doi.org/10.1007/978-3-319-63315-2_29

Copy DOI

Export

Save

Cite

Publication Date: Jan 1, 2017

Affiliation: Lanzhou University

Abstract
Full-Text
Similar Papers

Abstract

Listen

K-means is the most well-known and widely used classical clustering method, benefited from its efficiency and ease of implementation. But k-means has three main drawbacks: the selection of its initial cluster centers can greatly affect its final results, the number of clusters has to be predefined, and it can only find clusters of similar sizes. A lot of work has been done on improving the selection of the initial cluster centers and on determining the number of clusters. However, very little work has been done on improving k-means to deal with clusters of different sizes. In this paper, we have proposed a new clustering method, called k-normal, whose main idea is to learn cluster sizes during the same process of learning cluster centers. The proposed k-normal method can identify clusters of different sizes while keeping the efficiency of k-means. Although the Expectation Maximization (EM) method based on Gaussian mixture models can also identify the clusters of different sizes, it has a much higher computational complexity than both k-normal and k-means. Experiments on a synthetic dataset and seven real datasets show that, k-normal can outperform k-means on all the datasets. If compared with the EM method, k-normal still produces better results on six out of the eight datasets while enjoys a much higher efficiency.

Full Text