Improve consensus partitioning via a hierarchical procedure.

Zuguang Gu,Daniel Hübschmann

doi:10.1093/bib/bbac048

Zuguang Gu, Daniel Hübschmann

Open Access

https://doi.org/10.1093/bib/bbac048

Copy DOI

Journal: Briefings in bioinformatics	Publication Date: Mar 14, 2022
Citations: 1	License type: CC BY-NC 4.0

Affiliation: National Center for Tumor Diseases

Abstract

Consensus partitioning is an unsupervised method widely used in high-throughput data analysis for revealing subgroups and assigning stability for the classification. However, standard consensus partitioning procedures are weak for identifying large numbers of stable subgroups. There are two major issues. First, subgroups with small differences are difficult to be separated if they are simultaneously detected with subgroups with large differences. Second, stability of classification generally decreases as the number of subgroups increases. In this work, we proposed a new strategy to solve these two issues by applying consensus partitioning in a hierarchical procedure. We demonstrated hierarchical consensus partitioning can be efficient to reveal more meaningful subgroups. We also tested the performance of hierarchical consensus partitioning on revealing a great number of subgroups with a large deoxyribonucleic acid methylation dataset. The hierarchical consensus partitioning is implemented in the R package cola with comprehensive functionalities for analysis and visualization. It can also automate the analysis only with a minimum of two lines of code, which generates a detailed HTML report containing the complete analysis. The cola package is available at https://bioconductor.org/packages/cola/.

Full Text