Pan- and core- network analysis of co-expression genes in a model plant.

Fei He,Sergei Maslov

doi:10.1038/srep38956

Abstract

Genome-wide gene expression experiments have been performed using the model plant Arabidopsis during the last decade. Some studies involved construction of coexpression networks, a popular technique used to identify groups of co-regulated genes, to infer unknown gene functions. One approach is to construct a single coexpression network by combining multiple expression datasets generated in different labs. We advocate a complementary approach in which we construct a large collection of 134 coexpression networks based on expression datasets reported in individual publications. To this end we reanalyzed public expression data. To describe this collection of networks we introduced concepts of ‘pan-network’ and ‘core-network’ representing union and intersection between a sizeable fractions of individual networks, respectively. We showed that these two types of networks are different both in terms of their topology and biological function of interacting genes. For example, the modules of the pan-network are enriched in regulatory and signaling functions, while the modules of the core-network tend to include components of large macromolecular complexes such as ribosomes and photosynthetic machinery. Our analysis is aimed to help the plant research community to better explore the information contained within the existing vast collection of gene expression data in Arabidopsis.

Highlights

Coexpression networks represent all pairwise relationships of genes that have similar profiles in a given set of expression samples
In this study we built coexpression networks based on individual expression datasets from the model plant Arabidopsis in order to capture network aspects that appeared in a specific experimental setup[21]
Thousands of gene expression profiling datasets are available for Arabidopsis in public repositories such as Gene Expression Omnibus (GEO)

Summary

Introduction

Coexpression networks represent all pairwise relationships of genes that have similar profiles in a given set of expression samples. A common approach to building coexpression networks is to infer correlation relationships from a combination of multiple expression datasets produced by different labs[1,2,13,14,15]. Batch effects may give rise to false positive and spurious correlations between genes when microarray data from different labs are combined[19,20] Another problem with combined co-expression networks is that it may miss rare gene interactions formed under specific conditions such as a particular disease[21]. It might not be universally good to detect coexpression based on a combined dataset of microarray samples from different labs. We construct and analyze a comprehensive collection of 134 coexpression networks each based on expression samples from an individual published study, preserving context-specific network structure. Our analysis may provide guide for future coexpression network analysis in plants

Methods

Results

Conclusion