Combined Mapping of Multiple clUsteriNg ALgorithms (COMMUNAL): A Robust Method for Selection of Cluster Number, K.

Timothy E Sweeney,Olivier Gevaert,Albert C Chen

doi:10.1038/srep16971

Abstract

In order to discover new subsets (clusters) of a data set, researchers often use algorithms that perform unsupervised clustering, namely, the algorithmic separation of a dataset into some number of distinct clusters. Deciding whether a particular separation (or number of clusters, K) is correct is a sort of ‘dark art’, with multiple techniques available for assessing the validity of unsupervised clustering algorithms. Here, we present a new technique for unsupervised clustering that uses multiple clustering algorithms, multiple validity metrics, and progressively bigger subsets of the data to produce an intuitive 3D map of cluster stability that can help determine the optimal number of clusters in a data set, a technique we call COmbined Mapping of Multiple clUsteriNg ALgorithms (COMMUNAL). COMMUNAL locally optimizes algorithms and validity measures for the data being used. We show its application to simulated data with a known K, and then apply this technique to several well-known cancer gene expression datasets, showing that COMMUNAL provides new insights into clustering behavior and stability in all tested cases. COMMUNAL is shown to be a useful tool for determining K in complex biological datasets, and is freely available as a package for R.

Highlights

In order to discover new subsets of a data set, researchers often use algorithms that perform unsupervised clustering, namely, the algorithmic separation of a dataset into some number of distinct clusters
A single subset of the data clustered according to a single clustering algorithm, and this is measured by a single cluster validity measure
We hypothesized that integrating information from multiple clustering algorithms and multiple validity measures would improve the signal:noise ratio and assist in identifying stable clusters

Summary

Introduction

In order to discover new subsets (clusters) of a data set, researchers often use algorithms that perform unsupervised clustering, namely, the algorithmic separation of a dataset into some number of distinct clusters. There can be significant disagreement over how many clusters truly exist, or what the ‘right’ number of clusters is; for instance, in glioblastoma multiforme gene expression data, different studies have come to different conclusions for K (from 2–4)[1,2,3] This disagreement arises when different clustering algorithms are used to separate the data, and when different clustering validity metrics are used to judge the ‘right’. For instance, takes multiple subsets of a dataset and uses repeated predictions of cluster assignment to gauge stability[5] Another method, HOPACH, recursively partitions a dataset while seeking to optimize some clustering measure[6]

Methods

Results

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific reports	Publication Date: Nov 19, 2015
Citations: 16	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Combined Mapping of Multiple clUsteriNg ALgorithms (COMMUNAL): A Robust Method for Selection of Cluster Number, K.

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Scientific reports

Lead the way for us

Similar Papers

VALIDATION OF CLUSTERING METHODS FOR MEDICAL DATA SETS
Azam Orooji ... Farzaneh Kermani
Acta Healthmedica | VOL. 2
Azam Orooji, et. al.Azam Orooji ... Farzaneh Kermani
25 Feb 2017
Acta Healthmedica | VOL. 2

How many clusters are best? - An experiment
Richard C Dubes
Pattern Recognition | VOL. 20
Richard C DubesRichard C Dubes
01 Jan 1987
Pattern Recognition | VOL. 20

Dynamic parallel K-Means Algorithm Based On Dunn’s Index Method
Hitesh Kumari Yadav ... Sunil Dhankar
International Journal Of Engineering And Computer Science | VOL. 5
Hitesh Kumari Yadav, et. al.Hitesh Kumari Yadav ... Sunil Dhankar
29 Feb 2016
International Journal Of Engineering And Computer Science | VOL. 5

EXPLORING EFFICIENT KERNEL FUNCTIONS FOR SUPPORT VECTOR CLUSTERING
Furkan Burak Bağci ... Ömer Karal
Mugla Journal of Science and Technology | VOL. 6
Furkan Burak Bağci, et. al.Furkan Burak Bağci ... Ömer Karal
31 Dec 2020
Mugla Journal of Science and Technology | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Combined Mapping of Multiple clUsteriNg ALgorithms (COMMUNAL): A Robust Method for Selection of Cluster Number, K.

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Scientific reports