Comparison of threshold selection methods for microarray gene co-expression matrices

Bhavesh R Borate,Arnold M Saxton,Brynn H Voy,Elissa J Chesler,Michael A Langston

doi:10.1186/1756-0500-2-240

Bhavesh R Borate, Arnold M Saxton + Show 3 more

Open Access

https://doi.org/10.1186/1756-0500-2-240

Copy DOI

Abstract

BackgroundNetwork and clustering analyses of microarray co-expression correlation data often require application of a threshold to discard small correlations, thus reducing computational demands and decreasing the number of uninformative correlations. This study investigated threshold selection in the context of combinatorial network analysis of transcriptome data.FindingsSix conceptually diverse methods - based on number of maximal cliques, correlation of control spots with expressed genes, top 1% of correlations, spectral graph clustering, Bonferroni correction of p-values, and statistical power - were used to estimate a correlation threshold for three time-series microarray datasets. The validity of thresholds was tested by comparison to thresholds derived from Gene Ontology information. Stability and reliability of the best methods were evaluated with block bootstrapping.Two threshold methods, number of maximal cliques and spectral graph, used information in the correlation matrix structure and performed well in terms of stability. Comparison to Gene Ontology found thresholds from number of maximal cliques extracted from a co-expression matrix were the most biologically valid. Approaches to improve both methods were suggested.ConclusionThreshold selection approaches based on network structure of gene relationships gave thresholds with greater relevance to curated biological relationships than approaches based on statistical pair-wise relationships.

Highlights

To extract gene networks from microarray data, correlations are often used as a measure of gene co-expression
Threshold selection approaches based on network structure of gene relationships gave thresholds with greater relevance to curated biological relationships than approaches based on statistical pair-wise relationships
We focus on relevance networks, created by applying a hard threshold to the gene expression correlation matrix [2], extracting gene networks

Summary

Introduction

To extract gene networks from microarray data, correlations are often used as a measure of gene co-expression. A typical microarray with 20,000 gene probes will produce 200 million correlations. Correlations below a threshold value, closer to zero, will be less meaningful. Hard and soft threshold approaches have been applied to biological data. Hard thresholds discard gene pairs with correlation below the threshold, while soft thresholds use the correlation value to weight gene network relationships. Zhang and Horvath [1] concluded that soft thresholds based on aggregate, modular relationships between genes gave (page number not for citation purposes). Network and clustering analyses of microarray co-expression correlation data often require application of a threshold to discard small correlations, reducing computational demands and decreasing the number of uninformative correlations. This study investigated threshold selection in the context of combinatorial network analysis of transcriptome data

Methods

Results

Conclusion