Abstract

BackgroundRecently, supervised learning methods have been exploited to reconstruct gene regulatory networks from gene expression data. The reconstruction of a network is modeled as a binary classification problem for each pair of genes. A statistical classifier is trained to recognize the relationships between the activation profiles of gene pairs. This approach has been proven to outperform previous unsupervised methods. However, the supervised approach raises open questions. In particular, although known regulatory connections can safely be assumed to be positive training examples, obtaining negative examples is not straightforward, because definite knowledge is typically not available that a given pair of genes do not interact.ResultsA recent advance in research on data mining is a method capable of learning a classifier from only positive and unlabeled examples, that does not need labeled negative examples. Applied to the reconstruction of gene regulatory networks, we show that this method significantly outperforms the current state of the art of machine learning methods. We assess the new method using both simulated and experimental data, and obtain major performance improvement.ConclusionsCompared to unsupervised methods for gene network inference, supervised methods are potentially more accurate, but for training they need a complete set of known regulatory connections. A supervised method that can be trained using only positive and unlabeled data, as presented in this paper, is especially beneficial for the task of inferring gene regulatory networks, because only an incomplete set of known regulatory connections is available in public databases such as RegulonDB, TRRD, KEGG, Transfac, and IPA.

Highlights

  • Supervised learning methods have been exploited to reconstruct gene regulatory networks from gene expression data

  • RQ1: How do PosOnly, PSEUDO-RANDOM, and SVMOnly performances vary with the percentage of known positives? In particular, this research question aims to compare the performances of PosOnly, PSEUDO-RANDOM, and SVMOnly when the percentage of known positives varies from 10% to 100%

  • RQ3: How do PosOnly, PSEUDO-RANDOM, and SVMOnly performances compare with unsupervised information theoretic approaches, such as ARACNE and CLR? In particular, this research question aims to compare supervised learning approaches, PosOnly, PSEUDO-RANDOM, and SVMOnly, with unsupervised information theoretic approaches at different network sizes and at different percentage of known positives

Read more

Summary

Introduction

Supervised learning methods have been exploited to reconstruct gene regulatory networks from gene expression data. A statistical classifier is trained to recognize the relationships between the activation profiles of gene pairs. This approach has been proven to outperform previous unsupervised methods. In silico methods represent a promising direction that, through a reverse engineering approach, aim to extract gene regulatory networks from prior biological knowledge and available genomic and post-genomic data. Different model architectures to reverse engineer gene regulatory networks from gene expression data have been proposed in literature [1]. Such models represent biological regulations as a network where nodes represent elements of Information theory models correlate two genes by means of a correlation coefficient and a threshold. TD-ARACNE [2], ARACNE [3], and CLR [4] infer the network structure with a statistical score derived from the mutual information and a set of pruning heuristics

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call