Abstract

In Systems Biology, the complex relationships between different entities in the cells are modeled and analyzed using networks. Towards this aim, a rich variety of gene regulatory network (GRN) inference algorithms has been developed in recent years. However, most algorithms rely solely on gene expression data to reconstruct the network. Due to possible expression profile similarity, predictions can contain connections between biologically unrelated genes. Therefore, previously known biological information should also be considered by computational methods to obtain more consistent results, such as experimentally validated interactions between transcription factors and target genes. In this work, we propose XGBoost for gene regulatory networks (XGRN), a supervised algorithm, which combines gene expression data with previously known interactions for GRN inference. The key idea of our method is to train a regression model for each known interaction of the network and then utilize this model to predict new interactions. The regression is performed by XGBoost, a state-of-the-art algorithm using an ensemble of decision trees. In detail, XGRN learns a regression model based on gene expression of the two interactors and then provides predictions using as input the gene expression of other candidate interactors. Application on benchmark datasets and a real large single-cell RNA-Seq experiment resulted in high performance compared to other unsupervised and supervised methods, demonstrating the ability of XGRN to provide reliable predictions.

Highlights

  • A main direction in the Systems Biology field is detecting and studying the complex relationships between different molecules in the cell

  • A regression model was trained using the profiles of a transcription factors (TFs) and a known gene target to learn the relationship between them. It was tested using the profile of another gene, which was an actual target of the same TF, and the output was very similar to TF’s profile based on the R2 metric

  • We presented XGBoost for gene regulatory networks (XGRN), a local supervised method with the aim to model known interactions of a gene network and to predict new similar interactions

Read more

Summary

Introduction

A main direction in the Systems Biology field is detecting and studying the complex relationships between different molecules in the cell. Network modeling has been extensively used to analyze the interactions between genes, mRNAs, proteins or metabolites [1], as well as other entities, such as diseases [2,3] or drugs [4,5]. This approach has generated the Network Medicine field, where complex diseases are analyzed, which can concurrently affect many genes [6,7,8]. TFs are proteins that bind to DNA and regulate the expression of the genes, i.e., they can activate or inhibit the transcription

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call