Abstract

Co-expression networks are essential tools to infer biological associations between gene products and predict gene annotation. Global networks can be analyzed at the transcriptome-wide scale or after querying them with a set of guide genes to capture the transcriptional landscape of a given pathway in a process named Pathway Level Coexpression (PLC). A critical step in network construction remains the definition of gene co-expression. In the present work, we compared how Pearson Correlation Coefficient (PCC), Spearman Correlation Coefficient (SCC), their respective ranked values (Highest Reciprocal Rank (HRR)), Mutual Information (MI) and Partial Correlations (PC) performed on global networks and PLCs. This evaluation was conducted on the model plant Arabidopsis thaliana using microarray and differently pre-processed RNA-seq datasets. We particularly evaluated how dataset × distance measurement combinations performed in 5 PLCs corresponding to 4 well described plant metabolic pathways (phenylpropanoid, carbohydrate, fatty acid and terpene metabolisms) and the cytokinin signaling pathway. Our present work highlights how PCC ranked with HRR is better suited for global network construction and PLC with microarray and RNA-seq data than other distance methods, especially to cluster genes in partitions similar to biological subpathways.

Highlights

  • Co-expression networks are essential tools to infer biological associations between gene products and predict gene annotation

  • Each network performance was considered as a network ability to capture edges corresponding to functional associations found in the Gene Ontology (GO) reference dataset and was evaluated in 4 different ways (Fig. 2): GO term enrichment (GO terms that are significantly enriched with gene pairs from the co-expression network), a ROC curve constructed with True Positive Rates (TPR) and False Positive Rates (FPR) calculated for each confidence threshold and two ROC analyses based on the GBA concept, an average 3-fold cross validated neighbor voting (NV) AUROC and a global AUROC

  • Our present work highlights that distances between genes calculated with highest reciprocally ranked Pearson Correlation Coefficient (PCC) (PCC-highest reciprocal ranking (HRR)) improve Pathway-Level Coexpression (PLC)

Read more

Summary

Introduction

Co-expression networks are essential tools to infer biological associations between gene products and predict gene annotation. Guide gene sets may be used to extract more human-readable information from large networks in a process named Pathway-Level Coexpression (PLC)[2,3,4,5,6,7] This approach aims at capturing the best transcriptional associations of a gene set and at highlighting functional gene groups such as known subpathways in this set. Ryz and Rxz the simple correlation coefficients between genes x and y, y and z, x and z respectively[10], but such computations may be very time consuming for large datasets In this case, PCs should rather be calculated by multiple linear regression including a feature selection step[11].

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.