A Robust k-Means Clustering Algorithm Based on Observation Point Mechanism

Xiaoliang Zhang,Yi Jin,Honglian Qin,Joshua Zhexue Huang,Muhammad Azhar,Yulin He

doi:10.1155/2020/3650926

Xiaoliang Zhang, Yi Jin + Show 4 more

Open Access

https://doi.org/10.1155/2020/3650926

Copy DOI

Abstract

The k-means algorithm is sensitive to the outliers. In this paper, we propose a robust two-stage k-means clustering algorithm based on the observation point mechanism, which can accurately discover the cluster centers without the disturbance of outliers. In the first stage, a small subset of the original data set is selected based on a set of nondegenerate observation points. The subset is a good representation of the original data set because it only contains all those points that have a higher density of the original data set and does not include the outliers. In the second stage, we use the k-means clustering algorithm to cluster the selected subset and find the proper cluster centers as the true cluster centers of the original data set. Based on these cluster centers, the rest data points of the original data set are assigned to the clusters whose centers are the closest to the data points. The theoretical analysis and experimental results show that the proposed clustering algorithm has the lower computational complexity and better robustness in comparison with k-means clustering algorithm, thus demonstrating the feasibility and effectiveness of our proposed clustering algorithm.

Highlights

Clustering is an important research branch of data mining. e k-means algorithm is one of the most popular clustering methods [1]
We have conducted a series of experiments on 6 synthetic data sets and 3 benchmark data sets (UCI [22] and KEEL [23]) to validate the effectiveness of the proposed two-stage k-means clustering algorithm . e synthetic data sets can be downloaded from BaiduPan with the extraction code “p3mc.”
􏼡, respectively. ere are two outliers in data set #1. We choose another four synthetic data sets as shown in Figure 3 and three real-world data sets to compare the clustering performances of our proposed algorithm with the kmeans algorithm. e details of these data sets and experimental results are summarized in Table 1, where N is the number of the elements of the data set, t is the proportion of the outlier in the data set, k is the number of clusters, d is the dimension of data point, p is the percentile number, nc is the cardinality of selected subset, ARIkmeans and Timekmeans are the adjusted Rand index (ARI) and time consumption of k-means algorithm, and ARIour and Timeour are ARI and time consumption of our proposed algorithm

Summary

A Robust k-Means Clustering Algorithm Based on Observation Point Mechanism

We propose a robust two-stage k-means clustering algorithm based on the observation point mechanism, which can accurately discover the cluster centers without the disturbance of outliers. A small subset of the original data set is selected based on a set of nondegenerate observation points. We use the k-means clustering algorithm to cluster the selected subset and find the proper cluster centers as the true cluster centers of the original data set. Based on these cluster centers, the rest data points of the original data set are assigned to the clusters whose centers are the closest to the data points. Based on these cluster centers, the rest data points of the original data set are assigned to the clusters whose centers are the closest to the data points. e theoretical analysis and experimental results show that the proposed clustering algorithm has the lower computational complexity and better robustness in comparison with k-means clustering algorithm, demonstrating the feasibility and effectiveness of our proposed clustering algorithm

Introduction

Mathematical Principles

The Proposed Two-Stage k-Means Clustering Algorithm

Description of Algorithm

Experimental Results and Analysis

Conclusions and Future Work

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Complexity	Publication Date: Mar 30, 2020
Citations: 15	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Robust k-Means Clustering Algorithm Based on Observation Point Mechanism

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Complexity

Lead the way for us

Similar Papers

Real-time fault detection approach of software under big data environment
Xianrui Jian
-
Xianrui JianXianrui Jian
01 Jan 2015
01 Jan 2015

Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors
Juanying Xie ... Philip W Grant
Information Sciences | VOL. 354
Juanying Xie, et. al.Juanying Xie ... Philip W Grant
12 Mar 2016
Information Sciences | VOL. 354

K-Harmonic means type clustering algorithm for mixed datasets
Amir Ahmad ... Sarosh Hashmi
Applied Soft Computing | VOL. 48
Amir Ahmad, et. al.Amir Ahmad ... Sarosh Hashmi
29 Jun 2016
Applied Soft Computing | VOL. 48

Clustering Algorithm of Density Difference Optimized by Mixed Teaching and Learning
Hailong Chen ... Yutong Xue
SN Computer Science | VOL. 1
Hailong Chen, et. al.Hailong Chen ... Yutong Xue
01 May 2020
SN Computer Science | VOL. 1

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Robust k-Means Clustering Algorithm Based on Observation Point Mechanism

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Complexity