Abstract

BackgroundTo construct gene co-expression networks, it is necessary to evaluate the correlation between different gene expression profiles. However, commonly used correlation metrics, including both linear (such as Pearson’s correlation) and monotonic (such as Spearman’s correlation) dependence metrics, are not enough to observe the nature of real biological systems. Hence, introducing a more informative correlation metric when constructing gene co-expression networks is still an interesting topic.ResultsIn this paper, we test distance correlation, a correlation metric integrating both linear and non-linear dependence, with other three typical metrics (Pearson’s correlation, Spearman’s correlation, and maximal information coefficient) on four different arrays (macrophage and liver) and RNA-seq (cervical cancer and pancreatic cancer) datasets. Among all the metrics, distance correlation is distribution free and can provide better performance on complex relationships and anti-outlier. Furthermore, distance correlation is applied to Weighted Gene Co-expression Network Analysis (WGCNA) for constructing a gene co-expression network analysis method which we named Distance Correlation-based Weighted Gene Co-expression Network Analysis (DC-WGCNA). Compared with traditional WGCNA, DC-WGCNA can enhance the result of enrichment analysis and improve the module stability.ConclusionsDistance correlation is better at revealing complex biological relationships between gene profiles compared with other correlation metrics, which contribute to more meaningful modules when analyzing gene co-expression networks. However, due to the high time complexity of distance correlation, the implementation requires more computer memory.

Highlights

  • To construct gene co-expression networks, it is necessary to evaluate the correlation between different gene expression profiles

  • We validate the performance of the new algorithm based on scale-free topology (SFT) fit, clustering results, enrichment analysis, and module stability by analysing gene expression profiles using four datasets from microarray data and RNA-seq data

  • Distance correlation is distribution free A normal distribution is not a requirement for using the Pearson correlation coefficient, but the testing for statistical significance of the correlation may be reduced, so the Pearson correlation coefficient is usually not suggested for non-normally distributed data [12, 15, 27, 28]

Read more

Summary

Introduction

To construct gene co-expression networks, it is necessary to evaluate the correlation between different gene expression profiles. To construct and analyse a gene co-expression network, it is necessary to assess the interactions between two genes Such interactions are measured by calculating the correlation coefficients of different gene expression profiles. If the method of correlation measure is limited to linear dependence measures in the construction of gene co-expression networks, the ability of gene co-expression networks to recreate the accurate network and identify the appropriate gene modules will be limited. To overcome this barrier, additional appropriate methods are needed to measure the complex relationships between genes

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call