Abstract
BackgroundAltered networks of gene regulation underlie many complex conditions, including cancer. Inferring gene regulatory networks from high-throughput microarray expression data is a fundamental but challenging task in computational systems biology and its translation to genomic medicine. Although diverse computational and statistical approaches have been brought to bear on the gene regulatory network inference problem, their relative strengths and disadvantages remain poorly understood, largely because comparative analyses usually consider only small subsets of methods, use only synthetic data, and/or fail to adopt a common measure of inference quality.MethodsWe report a comprehensive comparative evaluation of nine state-of-the art gene regulatory network inference methods encompassing the main algorithmic approaches (mutual information, correlation, partial correlation, random forests, support vector machines) using 38 simulated datasets and empirical serous papillary ovarian adenocarcinoma expression-microarray data. We then apply the best-performing method to infer normal and cancer networks. We assess the druggability of the proteins encoded by our predicted target genes using the CancerResource and PharmGKB webtools and databases.ResultsWe observe large differences in the accuracy with which these methods predict the underlying gene regulatory network depending on features of the data, network size, topology, experiment type, and parameter settings. Applying the best-performing method (the supervised method SIRENE) to the serous papillary ovarian adenocarcinoma dataset, we infer and rank regulatory interactions, some previously reported and others novel. For selected novel interactions we propose testable mechanistic models linking gene regulation to cancer. Using network analysis and visualization, we uncover cross-regulation of angiogenesis-specific genes through three key transcription factors in normal and cancer conditions. Druggabilty analysis of proteins encoded by the 10 highest-confidence target genes, and by 15 genes with differential regulation in normal and cancer conditions, reveals 75% to be potential drug targets.ConclusionsOur study represents a concrete application of gene regulatory network inference to ovarian cancer, demonstrating the complete cycle of computational systems biology research, from genome-scale data analysis via network inference, evaluation of methods, to the generation of novel testable hypotheses, their prioritization for experimental validation, and discovery of potential drug targets.
Highlights
Altered networks of gene regulation underlie many complex conditions, including cancer
Gene regulatory network inference methods We selected for comparison eight state-of-the art unsupervised gene regulatory networks (GRNs) inference (GRNI) methods: Relevance Networks (RN) [36], Minimum Redundancy/Maximum Relevance Networks (MRNET) [33], Context Likelihood Relatedness (CLR) [37], The Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNE) [38], Partial Correlation and Information Theory (PCIT) [39], Weighted Gene Co-expression Network Analysis (WGCNA) [40], Gene Network Inference with Ensemble of Trees (GENIE3) [41], and CORRELATIONS [42]
We found that bin numbers between 3 and 6 gave the best performance irrespective of the combination of GRNI, mutual information (MI) estimator and discretization method (Figure S1 in Additional file 3)
Summary
Altered networks of gene regulation underlie many complex conditions, including cancer. Many diverse GRNI methods have been proposed, reflecting the enormous interest in the field, and the richness of computational mathematics, multivariate statistics and information science. These methods can be classified into two categories, unsupervised and supervised [8,9]. In the former, networks are inferred exclusively from the data (for example, differential gene expression), whereas supervised methods require additional knowledge of regulatory interactions as a training set. Methods based on mutual information capture non-linear as well as linear interactions but are applicable only to discrete data and need to employ discretization methods, which can be computationally demanding
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.