Identifying clusters in genomics data by recursive partitioning

Gro Nilsen,Knut Liestøl,Ørnulf Borgan,Ole Christian Lingjærde

doi:10.1515/sagmb-2013-0016

Abstract

Genomics studies frequently involve clustering of molecular data to identify groups, but common clustering methods such as K-means clustering and hierarchical clustering do not determine the number of clusters. Methods for estimating the number of clusters typically focus on identifying the global structure in the data, however the discovery of substructures within clusters may also be of great biological interest. We propose a novel method, Partitioning Algorithm based on Recursive Thresholding (PART), that recursively uncovers distinct subgroups in the groups already identified. Outliers are common in high-dimensional genomics data and may mask the presence of substructure within a cluster. A crucial feature of the algorithm is the introduction of tentative splits of clusters to isolate outliers that might otherwise halt the recursion prematurely. The method is demonstrated on simulated as well as a wide range of real data sets from gene expression microarrays, where the correct clusters were known in advance. When subclusters are present and the variance is large or varies between the clusters, the proposed method performs better than two established global methods on simulated data. On the real data sets the overall performance of PART is superior to the global methods when used in combination with hierarchical clustering. The method is implemented in the R package clusterGenomics and is freely available from CRAN (The Comprehensive R Archive Network).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Identifying clusters in genomics data by recursive partitioning

Abstract

Talk to us

Similar Papers

More From: Statistical Applications in Genetics and Molecular Biology

Lead the way for us

Journal: Statistical Applications in Genetics and Molecular Biology	Publication Date: Jan 1, 2013
Citations: 32

Similar Papers

A feature grouping method for ensemble clustering of high-dimensional genomic big data
Dewan Md Farid ... Ann Nowe
-
Dewan Md Farid, et. al.Dewan Md Farid ... Ann Nowe
01 Dec 2016
01 Dec 2016

A variable selection approach for highly correlated predictors in high-dimensional genomic data.
Wencan Zhu ... Céline Lévy-Leduc
Bioinformatics | VOL. 37
Wencan Zhu, et. al.Wencan Zhu ... Céline Lévy-Leduc
22 Feb 2021
Bioinformatics | VOL. 37

OBLIQUE RANDOM SURVIVAL FORESTS.
Byron C Jaeger ... Jeff M Szychowski
The annals of applied statistics | VOL. 13
Byron C Jaeger, et. al.Byron C Jaeger ... Jeff M Szychowski
01 Sep 2019
The annals of applied statistics | VOL. 13

CaBIG™ VISDA: Modeling, visualization, and discovery for cluster analysis of genomic data
Yitan Zhu ... Eric P Hoffman
BMC Bioinformatics | VOL. 9
Yitan Zhu, et. al.Yitan Zhu ... Eric P Hoffman
18 Sep 2008
BMC Bioinformatics | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Identifying clusters in genomics data by recursive partitioning

Abstract

Talk to us

Similar Papers

More From: Statistical Applications in Genetics and Molecular Biology