DREAM4 Challenge Research Articles

Context: Inferring gene regulatory networks (GRN) from high-throughput gene expression data is a challenging task for which different strategies have been developed. Nevertheless, no ever-winning method exists, and each method has its advantages, intrinsic biases, and application domains. Thus, in order to analyze a dataset, users should be able to test different techniques and choose the most appropriate one. This step can be particularly difficult and time consuming, since most methods' implementations are made available independently, possibly in different programming languages. The implementation of an open-source library containing different inference methods within a common framework is expected to be a valuable toolkit for the systems biology community. Results: In this work, we introduce GReNaDIne (Gene Regulatory Network Data-driven Inference), a Python package that implements 18 machine learning data-driven gene regulatory network inference methods. It also includes eight generalist preprocessing techniques, suitable for both RNA-seq and microarray dataset analysis, as well as four normalization techniques dedicated to RNA-seq. In addition, this package implements the possibility to combine the results of different inference tools to form robust and efficient ensembles. This package has been successfully assessed under the DREAM5 challenge benchmark dataset. The open-source GReNaDIne Python package is made freely available in a dedicated GitLab repository, as well as in the official third-party software repository PyPI Python Package Index. The latest documentation on the GReNaDIne library is also available at Read the Docs, an open-source software documentation hosting platform. Contribution: The GReNaDIne tool represents a technological contribution to the field of systems biology. This package can be used to infer gene regulatory networks from high-throughput gene expression data using different algorithms within the same framework. In order to analyze their datasets, users can apply a battery of preprocessing and postprocessing tools and choose the most adapted inference method from the GReNaDIne library and even combine the output of different methods to obtain more robust results. The results format provided by GReNaDIne is compatible with well-known complementary refinement tools such as PYSCENIC.

Read full abstract

BackgroundReconstruction of gene regulatory networks (GRNs), also known as reverse engineering of GRNs, aims to infer the potential regulation relationships between genes. With the development of biotechnology, such as gene chip microarray and RNA-sequencing, the high-throughput data generated provide us with more opportunities to infer the gene-gene interaction relationships using gene expression data and hence understand the underlying mechanism of biological processes. Gene regulatory networks are known to exhibit a multiplicity of interaction mechanisms which include functional and non-functional, and linear and non-linear relationships. Meanwhile, the regulatory interactions between genes and gene products are not spontaneous since various processes involved in producing fully functional and measurable concentrations of transcriptional factors/proteins lead to a delay in gene regulation. Many different approaches for reconstructing GRNs have been proposed, but the existing GRN inference approaches such as probabilistic Boolean networks and dynamic Bayesian networks have various limitations and relatively low accuracy. Inferring GRNs from time series microarray data or RNA-sequencing data remains a very challenging inverse problem due to its nonlinearity, high dimensionality, sparse and noisy data, and significant computational cost, which motivates us to develop more effective inference methods.ResultsWe developed a novel algorithm, MICRAT (Maximal Information coefficient with Conditional Relative Average entropy and Time-series mutual information), for inferring GRNs from time series gene expression data. Maximal information coefficient (MIC) is an effective measure of dependence for two-variable relationships. It captures a wide range of associations, both functional and non-functional, and thus has good performance on measuring the dependence between two genes. Our approach mainly includes two procedures. Firstly, it employs maximal information coefficient for constructing an undirected graph to represent the underlying relationships between genes. Secondly, it directs the edges in the undirected graph for inferring regulators and their targets. In this procedure, the conditional relative average entropies of each pair of nodes (or genes) are employed to indicate the directions of edges. Since the time delay might exist in the expression of regulators and target genes, time series mutual information is combined to cooperatively direct the edges for inferring the potential regulators and their targets. We evaluated the performance of MICRAT by applying it to synthetic datasets as well as real gene expression data and compare with other GRN inference methods. We inferred five 10-gene and five 100-gene networks from the DREAM4 challenge that were generated using the gene expression simulator GeneNetWeaver (GNW). MICRAT was also used to reconstruct GRNs on real gene expression data including part of the DNA-damaged response pathway (SOS DNA repair network) and experimental dataset in E. Coli. The results showed that MICRAT significantly improved the inference accuracy, compared to other inference methods, such as TDBN, etc.ConclusionIn this work, a novel algorithm, MICRAT, for inferring GRNs from time series gene expression data was proposed by taking into account dependence and time delay of expressions of a regulator and its target genes. This approach employed maximal information coefficients for reconstructing an undirected graph to represent the underlying relationships between genes. The edges were directed by combining conditional relative average entropy with time course mutual information of pairs of genes. The proposed algorithm was evaluated on the benchmark GRNs provided by the DREAM4 challenge and part of the real SOS DNA repair network in E. Coli. The experimental study showed that our approach was comparable to other methods on 10-gene datasets and outperformed other methods on 100-gene datasets in GRN inference from time series datasets.

Read full abstract

DREAM4 Challenge Research Articles

Articles published on DREAM4 Challenge

Inference of Gene Regulatory Networks Based on Multi-view Hierarchical Hypergraphs.

Gene regulatory network inference using mixed-norms regularized multivariate model with covariance selection.

GReNaDIne: A Data-Driven Python Library to Infer Gene Regulatory Networks from Gene Expression Data.

AGRN: accurate gene regulatory network inference using ensemble machine learning methods.

Accurate determination of causalities in gene regulatory networks by dissecting downstream target genes.

Learning complex dependency structure of gene regulatory networks from high dimensional microarray data with Gaussian Bayesian networks

Systematic inference of indirect transcriptional regulation by protein kinases and phosphatases.

PEPN-GRN: A Petri net-based approach for the inference of gene regulatory networks from noisy gene expression data.

ComHub: Community predictions of hubs in gene regulatory networks

An Ensemble Method to Reconstruct Gene Regulatory Networks Based on Multivariate Adaptive Regression Splines.

Inferring of regulatory networks from expression data using Bayesian networks

Ensembles of extremely randomized predictive clustering trees for predicting structured outputs

GREMA: modelling of emulated gene regulatory networks with confidence levels based on evolutionary intelligence to cope with the underdetermined problem.

A Fast and Furious Bayesian Network and Its Application of Identifying Colon Cancer to Liver Metastasis Gene Regulatory Networks.

Robust network inference using response logic.

Ensemble multi‐objective evolutionary algorithm for gene regulatory network reconstruction based on fuzzy cognitive maps

MICRAT: a novel algorithm for inferring gene regulatory networks using time series gene expression data

Improving eQTL Analysis Using a Machine Learning Approach for Data Integration: A Logistic Model Tree Solution.

An approach for reduction of false predictions in reverse engineering of gene regulatory networks

Inferring large graphs using ell _1-penalized likelihood

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

DREAM4 Challenge Research Articles

Articles published on DREAM4 Challenge

Inference of Gene Regulatory Networks Based on Multi-view Hierarchical Hypergraphs.

Gene regulatory network inference using mixed-norms regularized multivariate model with covariance selection.

GReNaDIne: A Data-Driven Python Library to Infer Gene Regulatory Networks from Gene Expression Data.

AGRN: accurate gene regulatory network inference using ensemble machine learning methods.

Accurate determination of causalities in gene regulatory networks by dissecting downstream target genes.

Learning complex dependency structure of gene regulatory networks from high dimensional microarray data with Gaussian Bayesian networks

Systematic inference of indirect transcriptional regulation by protein kinases and phosphatases.

PEPN-GRN: A Petri net-based approach for the inference of gene regulatory networks from noisy gene expression data.

ComHub: Community predictions of hubs in gene regulatory networks

An Ensemble Method to Reconstruct Gene Regulatory Networks Based on Multivariate Adaptive Regression Splines.

Inferring of regulatory networks from expression data using Bayesian networks

Ensembles of extremely randomized predictive clustering trees for predicting structured outputs

GREMA: modelling of emulated gene regulatory networks with confidence levels based on evolutionary intelligence to cope with the underdetermined problem.

A Fast and Furious Bayesian Network and Its Application of Identifying Colon Cancer to Liver Metastasis Gene Regulatory Networks.

Robust network inference using response logic.

Ensemble multi‐objective evolutionary algorithm for gene regulatory network reconstruction based on fuzzy cognitive maps

MICRAT: a novel algorithm for inferring gene regulatory networks using time series gene expression data

Improving eQTL Analysis Using a Machine Learning Approach for Data Integration: A Logistic Model Tree Solution.

An approach for reduction of false predictions in reverse engineering of gene regulatory networks

Inferring large graphs using ell _1-penalized likelihood