A Survey of Clustering Algorithms for High-Dimensional Data Mining

Anil Kukreti

doi:10.17762/msea.v70i2.2473

Abstract

The increasing complexity and dimensionality of data in numerous domains, including bioinformatics, text mining, multimedia, and social network analysis, pose significant challenges to traditional clustering algorithms. This paper provides a comprehensive review of various clustering algorithms suitable for high-dimensional data mining. We begin by elaborating on the challenges and difficulties intrinsic to high-dimensional spaces, such as the curse of dimensionality and the concentration of measure. Following that, we investigate a variety of different approaches to clustering, such as partitioning techniques, grid-based methods, hierarchical methods, density-based methods, and model-based methods. These algorithms are examined with a particular emphasis on how well they deal with high-dimensional data and how well they deal with data noise, as well as how well they can scale and how easily they can be interpreted. The most current developments in subspace and correlation clustering, as well as embedding approaches and the application of deep learning to the clustering of high-dimensional data, are also included in this overview. With the help of this in-depth analysis, our goals are to give insights into the benefits and drawbacks of each algorithm and to aid academics and practitioners in picking the optimal approach for the high-dimensional data mining projects they are working on.

Full Text