Gene Coexpression Network Analysis as a Source of Functional Annotation for Rice Genes

Kevin L Childs,C Robin Buell,Rebecca M Davidson,Najib M El-Sayed

doi:10.1371/journal.pone.0022196

Kevin L Childs, C Robin Buell + Show 2 more

Open Access

https://doi.org/10.1371/journal.pone.0022196

Copy DOI

Journal: PLoS ONE	Publication Date: Jul 22, 2011
Citations: 181	License type: CC BY 4.0

Affiliation: Michigan State University, Michigan United

Abstract

With the existence of large publicly available plant gene expression data sets, many groups have undertaken data analyses to construct gene coexpression networks and functionally annotate genes. Often, a large compendium of unrelated or condition-independent expression data is used to construct gene networks. Condition-dependent expression experiments consisting of well-defined conditions/treatments have also been used to create coexpression networks to help examine particular biological processes. Gene networks derived from either condition-dependent or condition-independent data can be difficult to interpret if a large number of genes and connections are present. However, algorithms exist to identify modules of highly connected and biologically relevant genes within coexpression networks. In this study, we have used publicly available rice (Oryza sativa) gene expression data to create gene coexpression networks using both condition-dependent and condition-independent data and have identified gene modules within these networks using the Weighted Gene Coexpression Network Analysis method. We compared the number of genes assigned to modules and the biological interpretability of gene coexpression modules to assess the utility of condition-dependent and condition-independent gene coexpression networks. For the purpose of providing functional annotation to rice genes, we found that gene modules identified by coexpression analysis of condition-dependent gene expression experiments to be more useful than gene modules identified by analysis of a condition-independent data set. We have incorporated our results into the MSU Rice Genome Annotation Project database as additional expression-based annotation for 13,537 genes, 2,980 of which lack a functional annotation description. These results provide two new types of functional annotation for our database. Genes in modules are now associated with groups of genes that constitute a collective functional annotation of those modules. Additionally, the expression patterns of genes across the treatments/conditions of an expression experiment comprise a second form of useful annotation.

Highlights

The importance of large-scale gene expression analysis in understanding gene function became apparent with the first report of genome-wide transcript expression profiling with DNA microarrays [1]
Some individual chips were excluded after quality analysis, and in two cases, this resulted in all replicates for a single treatment being discarded: shoot 2Fe+P from GSE17245 and LL LDHC 124 hrs from E-MEXP-2506
Several projects have performed correlation analyses on plant gene expression data in order to identify gene associations that may imply common functions or even regulatory relationships [6,7,8,9,10,13,17,18, 19,21,22,25,26]. Many of these efforts use combined expression data sets from numerous independent experiments, and the results are typically presented in terms of complex gene association networks

Summary

Introduction

The importance of large-scale gene expression analysis in understanding gene function became apparent with the first report of genome-wide transcript expression profiling with DNA microarrays [1] This led to the use of coexpression analyses to measure the physiological state of cells and to characterize genes with no known function [2]. Numerous projects perform large-scale gene expression analyses in which coexpression networks are created Several of these combine results from individual experiments and utilize Pearson correlation coefficients between all gene pairs [5,6,7,8,9,10,11] while others incorporate multiple types of data including gene transcript levels, protein-protein interactions, metabolite profiles, and predicted conserved gene interactions [6,12,13,14]. Due to the complexity of gene coexpression networks, various methods have been used to find the most informative relationships within correlation networks [17,18,19,20,21,22,23,24]

Objectives

Methods

Results

Conclusion