Efficient and robust clustering based on backbone identification

Hassan Motallebi

doi:10.1016/j.patcog.2024.110635

Abstract

Clustering is the process of grouping similar data objects into different subsets based on their similarities. Inspired by the concept of the popularity of individuals in a community, we rate the popularity of each sample which reflects the centrality of that sample in the dataset. With the aim of identifying clusters with arbitrary shapes and varying densities, we propose a clustering approach that divides samples into separate population groups. This approach is based on identifying the backbone of data, characterized by a set of popular points surrounded by less popular points. To distinguish poorly separated clusters, a proximity measure is defined based on the popularity of samples. We also use the popularity of samples to assign halo points to clusters and calculate cohesion between clusters. The proposed clustering method can detect arbitrary-shaped clusters with varying densities without requiring to specify the number of clusters. Outliers are also identified according to popularity. We demonstrate the effectiveness of the approach on synthetic and real-world datasets.

Full Text