Abstract

Characterisation of gene-regulatory network (GRN) interactions provides a stepping stone to understanding how genes affect cellular phenotypes. Yet, despite advances in profiling technologies, GRN reconstruction from gene expression data remains a pressing problem in systems biology. Here, we devise a supervised learning approach, GRADIS, which utilises support vector machine to reconstruct GRNs based on distance profiles obtained from a graph representation of transcriptomics data. By employing the data from Escherichia coli and Saccharomyces cerevisiae as well as synthetic networks from the DREAM4 and five network inference challenges, we demonstrate that our GRADIS approach outperforms the state-of-the-art supervised and unsupervided approaches. This holds when predictions about target genes for individual transcription factors as well as for the entire network are considered. We employ experimentally verified GRNs from E. coli and S. cerevisiae to validate the predictions and obtain further insights in the performance of the proposed approach. Our GRADIS approach offers the possibility for usage of other network-based representations of large-scale data, and can be readily extended to help the characterisation of other cellular networks, including protein–protein and protein–metabolite interactions.

Highlights

  • Characterisation of gene-regulatory networks (GRNs) remains one of the key challenges in systems biology[1,2]

  • The results demonstrate that GRNs inferred by GRADIS are of higher accuracy, assessed by the area under the ROC curve and the area under the precision-recall curve, in comparison with all other

  • The results show that using the graph-based features, support vector machines (SVMs) classifier performs better than random forests in reconstructing GRNs (Supplementary Table S3)

Read more

Summary

INTRODUCTION

Characterisation of gene-regulatory networks (GRNs) remains one of the key challenges in systems biology[1,2]. It first sets aside one subset, and trains an SVM from step (1) for each TF–gene pair are cast as a Euclidean-metric classifier using all known positive instances and the two other complete graph, where the gene can either encode TF or non-TF; subsets, treated as negative instances. Non-redundant features, a pre-processing step is needed to cluster the data samples into a smaller number of clusters based on their similarity This step differs from determination of clusters based on genes, applied in other GRN reconstruction approaches[27]. We propose a global SVM-based supervised approach, termed GRADIS, to infer GRNs from genome-wide expression data and known regulatory interactions. To provide a global supervised approach for GRN reconstruction, we build a feature vector for a TF–gene pair based on the respective expression profiles. We create a complete edge-weighted graph for existing supervised learning approaches for GRN reconstruction, the most widely used unsupervised approaches (i.e., CLR, ARACNE, GENIE3, iRafNet, mrnet[24] and TIGRESS25), and their combination following ensemble learning strategies

RESULTS
Methods
METHODS
Findings
CODE AVAILABILITY
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call