Abstract
The accurate construction and interpretation of gene association networks (GANs) is challenging, but crucial, to the understanding of gene function, interaction and cellular behavior at the genome level. Most current state-of-the-art computational methods for genome-wide GAN reconstruction require high-performance computational resources. However, even high-performance computing cannot fully address the complexity involved with constructing GANs from very large-scale expression profile datasets, especially for the organisms with medium to large size of genomes, such as those of most plant species. Here, we present a new approach, GPLEXUS (http://plantgrn.noble.org/GPLEXUS/), which integrates a series of novel algorithms in a parallel-computing environment to construct and analyze genome-wide GANs. GPLEXUS adopts an ultra-fast estimation for pairwise mutual information computing that is similar in accuracy and sensitivity to the Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNE) method and runs ∼1000 times faster. GPLEXUS integrates Markov Clustering Algorithm to effectively identify functional subnetworks. Furthermore, GPLEXUS includes a novel ‘condition-removing’ method to identify the major experimental conditions in which each subnetwork operates from very large-scale gene expression datasets across several experimental conditions, which allows users to annotate the various subnetworks with experiment-specific conditions. We demonstrate GPLEXUS’s capabilities by construing global GANs and analyzing subnetworks related to defense against biotic and abiotic stress, cell cycle growth and division in Arabidopsis thaliana.
Highlights
The availability of terabyte- and petabyte-sized gene expression datasets in public repositories [1,2] has inspired scientists to use genome-wide reverse genetic approaches to reconstruct gene networks and decipher the interaction between genes
Our results show that the Spearman correlation-based transformation method that is implemented in GPLEXUS has a significantly reduced runtime compared with the original Accurate Cellular Networks (ARACNE) method and B-spline-based mutual information (MI) estimation method
It was computationally infeasible to construct global gene association networks (GANs) using large-scale genomic datasets from plant species with small genomes, such as A. thaliana and G. max, with the original ARACNE method on a typical server (DELL PowerEdge R815 Server equipped with four 8-core CPUs and 128-GB RAM) without any optimization
Summary
The availability of terabyte- and petabyte-sized gene expression datasets in public repositories [1,2] has inspired scientists to use genome-wide reverse genetic approaches to reconstruct gene networks and decipher the interaction between genes. One problem that is inherent in this co-expression network method is its high false-positive prediction rate, which is due to its inability to distinguish direct gene interactions from large number of indirect interactions. Other methods, such as the Bayesian Network [7] and Gaussian Graphics Model (GGM) [9], can infer the local network structure with high precision [10], but cannot handle genome-wide network construction due to the increased computational complexity that arises from the large number of gene variables [10]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.