Out‐of‐bag stability estimation for k‐means clustering

Tianmou Liu,Rachael Hageman Blair,Han Yu

doi:10.1002/sam.11593

Abstract

AbstractClustering data is a challenging problem in unsupervised learning where there is no gold standard. Results depend on several factors, such as the selection of a clustering method, measures of dissimilarity, parameters, and the determination of the number of reliable groupings. Stability has become a valuable surrogate to performance and robustness that can provide insight to an investigator on the quality of a clustering, and guidance on subsequent cluster prioritization. This work develops a framework for stability measurements that is based on resampling and OB estimation. Bootstrapping methods for cluster stability can be prone to overfitting in a setting that is analogous to poor delineation of test and training sets in supervised learning. Stability that relies on OB items from a resampling overcomes these issues and does not depend on a reference clustering for comparisons. Furthermore, OB stability can provide estimates at the level of the item, cluster, and as an overall summary, which has good interpretive value. This framework is extended to develop stability estimates for determining the number of clusters (model selection) through contrasts between stability estimates on clustered data, and stability estimates of clustered reference data with no signal. These contrasts form stability profiles that can be used to identify the largest differences in stability and do not require a direct threshold on stability values, which tend to be data specific. These approaches can be implemented using the R package bootcluster that is available on the Comprehensive R Archive Network.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Out‐of‐bag stability estimation for k‐means clustering

Abstract

Talk to us

Similar Papers

More From: Statistical Analysis and Data Mining: The ASA Data Science Journal

Lead the way for us

Journal: Statistical Analysis and Data Mining: The ASA Data Science Journal	Publication Date: Aug 3, 2022
Citations: 1

Similar Papers

Bootstrapping estimates of stability for clusters, observations and model selection
Han Yu ... Ellen Eischen
Computational Statistics | VOL. 34
Han Yu, et. al.Han Yu ... Ellen Eischen
28 Aug 2018
Computational Statistics | VOL. 34

Smoothed Analysis in Unsupervised Learning via Decoupling
Aditya Bhaskara ... Aidan Perreault
-
Aditya Bhaskara, et. al.Aditya Bhaskara ... Aidan Perreault
01 Nov 2019
01 Nov 2019

A Primer on Machine Learning.
Audrene S Edwards ... Tun Jie
Transplantation | VOL. 105
Audrene S Edwards, et. al.Audrene S Edwards ... Tun Jie
18 Aug 2020
Transplantation | VOL. 105

Stability estimation for unsupervised clustering: A review.
Tianmou Liu ... Rachael Hageman Blair
Wiley interdisciplinary reviews. Computational statistics | VOL. 14
Tianmou Liu, et. al.Tianmou Liu ... Rachael Hageman Blair
09 Jan 2022
Wiley interdisciplinary reviews. Computational statistics | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Out‐of‐bag stability estimation for k‐means clustering

Abstract

Talk to us

Similar Papers

More From: Statistical Analysis and Data Mining: The ASA Data Science Journal