SEC: More Accurate Clustering Algorithm via Structural Entropy

Junyu Huang,Jianxin Wang,Qilong Feng,Ziyun Huang,Jinhui Xu,Jiahui Wang

doi:10.1609/aaai.v38i11.29152

Abstract

As one of the most popular machine learning tools in the field of unsupervised learning, clustering has been widely used in various practical applications. While numerous methods have been proposed for clustering, a commonly encountered issue is that the existing clustering methods rely heavily on local neighborhood information during the optimization process, which leads to suboptimal performance on real-world datasets. Besides, most existing clustering methods use Euclidean distances or densities to measure the similarity between data points. This could constrain the effectiveness of the algorithms for handling datasets with irregular patterns. Thus, a key challenge is how to effectively capture the global structural information in clustering instances to improve the clustering quality. In this paper, we propose a new clustering algorithm, called SEC. This algorithm uses the global structural information extracted from an encoding tree to guide the clustering optimization process. Based on the relation between data points in the instance, a sparse graph of the clustering instance can be constructed. By leveraging the sparse graph constructed, we propose an iterative encoding tree method, where hierarchical abstractions of the encoding tree are iteratively extracted as new clustering features to obtain better clustering results. To avoid the influence of easily misclustered data points located on the boundaries of the clustering partitions, which we call "fringe points", we propose an iterative pre-deletion and reassignment technique such that the algorithm can delete and reassign the "fringe points" to obtain more resilient and precise clustering results. Empirical experiments on both synthetic and real-world datasets demonstrate that our proposed algorithm outperforms state-of-the-art clustering methods and achieves better clustering performances. On average, the clustering accuracy (ACC) is increased by 1.7% and the normalized mutual information (NMI) by 7.9% compared with the current state-of-the-art (SOTA) algorithm on synthetic datasets. On real-world datasets, our method outperforms other clustering methods with an average increase of 12.3% in ACC and 5.2% in NMI, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

SEC: More Accurate Clustering Algorithm via Structural Entropy

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Similar Papers

Is Synthetic Dataset Reliable for Benchmarking Generalizable Person Re-Identification?
Cuicui Kang
-
Cuicui KangCuicui Kang
10 Oct 2022
10 Oct 2022

Data Clustering Based on Community Structure in Mutual k-Nearest Neighbor Graph
Honglei Zhang ... Moncef Gabbouj
-
Honglei Zhang, et. al.Honglei Zhang ... Moncef Gabbouj
01 Jul 2018
01 Jul 2018

Block spectral clustering for multiple graphs with inter-relation
Chuan Chen ... Shuqin Zhang
Network Modeling Analysis in Health Informatics and Bioinformatics | VOL. 6
Chuan Chen, et. al.Chuan Chen ... Shuqin Zhang
26 Apr 2017
Network Modeling Analysis in Health Informatics and Bioinformatics | VOL. 6

Benchmarking the benchmark — Comparing synthetic and real-world Network IDS datasets
Siamak Layeghy ... Marius Portmann
Journal of Information Security and Applications | VOL. 80
Siamak Layeghy, et. al.Siamak Layeghy ... Marius Portmann
02 Jan 2024
Journal of Information Security and Applications | VOL. 80

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SEC: More Accurate Clustering Algorithm via Structural Entropy

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence