Abstract

For high-dimensional datasets in which clusters are formed by both distance and density structures (DDS), many clustering algorithms fail to identify these clusters correctly. This is demonstrated for 32 clustering algorithms using a suite of datasets which deliberately pose complex DDS challenges for clustering. In order to improve the structure finding and clustering in high-dimensional DDS datasets, projection-based clustering (PBC) is introduced. The coexistence of projection and clustering allows to explore DDS through a topographic map. This enables to estimate, first, if any cluster tendency exists and, second, the estimation of the number of clusters. A comparison showed that PBC is always able to find the correct cluster structure, while the performance of the best of the 32 clustering algorithms varies depending on the dataset.

Highlights

  • Many data mining methods rely on some concept of the similarity between pieces of information encoded in the data of interest

  • Compact structures are mainly defined by inter- versus intracluster distances (Euclidean graph), whereas connected clusters are defined by neighborhood and density of the data which can be described by various other graphs (Thrun 2018)

  • projection-based clustering (PBC) using NerV projection is compared to other clustering approaches combined with dimensionality reduction

Read more

Summary

Introduction

Many data mining methods rely on some concept of the similarity between pieces of information encoded in the data of interest. The corresponding methods can be either datadriven or need-driven. The latter, called constraint clustering (Tung et al 2001), aims at organizing the data to meet particular application requirements The focus is placed on data-driven methods, in which objects are similar within clusters and dissimilar between clusters restricted to a metric Journal of Classification (2021) 38:280–312 often Euclidean dissimilarity). Cluster analysis is seen here as a step in the knowledge discovery process

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call