Sparse clusterability: testing for cluster structure in high dimensions

Jose Laborde,Paul A Stewart,Yian A Chen,Naomi C Brownstein,Zhihua Chen

doi:10.1186/s12859-023-05210-6

Jose Laborde, Paul A Stewart + Show 3 more

Open Access

https://doi.org/10.1186/s12859-023-05210-6

Copy DOI

Abstract

BackgroundCluster analysis is utilized frequently in scientific theory and applications to separate data into groups. A key assumption in many clustering algorithms is that the data was generated from a population consisting of multiple distinct clusters. Clusterability testing allows users to question the inherent assumption of latent cluster structure, a theoretical requirement for meaningful results in cluster analysis.ResultsThis paper proposes methods for clusterability testing designed for high-dimensional data by utilizing sparse principal component analysis. Type I error and power of the clusterability tests are evaluated using simulated data with different types of cluster structure in high dimensions. Empirical performance of the new methods is evaluated and compared with prior methods on gene expression, microarray, and shotgun proteomics data. Our methods had reasonably low Type I error and maintained power for many datasets with a variety of structures and dimensions. Cluster structure was not detectable in other datasets with spatially close clusters.ConclusionThis is the first analysis of clusterability testing on both simulated and real-world high-dimensional data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Mar 31, 2023
Citations: 1	License type: open-access

R Discovery Prime

R Discovery Prime

Sparse clusterability: testing for cluster structure in high dimensions

Abstract

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Machine-learned cluster identification in high-dimensional data
Alfred Ultsch ... Jörn Lötsch
Journal of Biomedical Informatics | VOL. 66
Alfred Ultsch, et. al.Alfred Ultsch ... Jörn Lötsch
28 Dec 2016
Journal of Biomedical Informatics | VOL. 66

Detecting stable clusters using principal component analysis.
Asa Ben-Hur ... Isabelle Guyon
Methods in molecular biology (Clifton, N.J.) | VOL. 224
Asa Ben-Hur, et. al.Asa Ben-Hur ... Isabelle Guyon
01 Jan 2003
Methods in molecular biology (Clifton, N.J.) | VOL. 224

Review of Traditional and Ensemble Clustering Algorithms for High Dimensional Data
K Kalaiselvi ... Karthika D
SSRN Electronic Journal | VOL. -
K Kalaiselvi, et. al.K Kalaiselvi ... Karthika D
01 Jan 2018
SSRN Electronic Journal | VOL. -

ClusterBMA: Bayesian model averaging for clustering.
Owen Forbes ... Daniel F Hermens
PLOS ONE | VOL. 18
Owen Forbes, et. al.Owen Forbes ... Daniel F Hermens
21 Aug 2023
PLOS ONE | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sparse clusterability: testing for cluster structure in high dimensions

Abstract

Talk to us

Similar Papers

More From: BMC Bioinformatics