Abstract

BackgroundSupervised machine learning approaches have been recently adopted in the inference of transcriptional targets from high throughput trascriptomic and proteomic data showing major improvements from with respect to the state of the art of reverse gene regulatory network methods. Beside traditional unsupervised techniques, a supervised classifier learns, from known examples, a function that is able to recognize new relationships for new data. In the context of gene regulatory inference a supervised classifier is coerced to learn from positive and unlabeled examples, as the counter negative examples are unavailable or hard to collect. Such a condition could limit the performance of the classifier especially when the amount of training examples is low.ResultsIn this paper we improve the supervised identification of transcriptional targets by selecting reliable counter negative examples from the unlabeled set. We introduce an heuristic based on the known topology of transcriptional networks that in fact restores the conventional positive/negative training condition and shows a significant improvement of the classification performance. We empirically evaluate the proposed heuristic with the experimental datasets of Escherichia coli and show an example of application in the prediction of BCL6 direct core targets in normal germinal center human B cells obtaining a precision of 60%.ConclusionsThe availability of only positive examples in learning transcriptional relationships negatively affects the performance of supervised classifiers. We show that the selection of reliable negative examples, a practice adopted in text mining approaches, improves the performance of such classifiers opening new perspectives in the identification of new transcriptional targets.

Highlights

  • Supervised machine learning approaches have been recently adopted in the inference of transcriptional targets from high throughput trascriptomic and proteomic data showing major improvements from with respect to the state of the art of reverse gene regulatory network methods

  • We showed that the selection of reliable negative examples, a practice adopted in text mining approaches, could improve the performance of such classifiers opening new perspectives in predicting new transcriptional targets

  • We introduced a new negative selection heuristic, NOIT, that promotes, as negative candidates of a transcription-factor, genes that are not regulated indirectly through other transcription-factors

Read more

Summary

Introduction

Supervised machine learning approaches have been recently adopted in the inference of transcriptional targets from high throughput trascriptomic and proteomic data showing major improvements from with respect to the state of the art of reverse gene regulatory network methods. Most of the approaches of the second class are basically unsupervised and model the reconstruction of transcriptional relationships as a classification problem, where the basic decision is the presence or absence of a relationship between a given pair of genes [3,4,5,6] Those methods can be distinguished in: i) gene relevance network models, which detect gene-gene interactions with a similarity measure and a threshold, such as ARACNE [7], TimeDelay-ARACNE [8], and CLR [9] that infer the network structure with a statistical score derived from the mutual information and a set of pruning heuristics; ii) boolean network models, which adopt a binary variable to represent the state of a gene activity and a directed graph, where edges are represented by boolean functions (e.g. REVEAL [10]); iii) differential and difference equation models, which describe gene expression changes as a function of the expression level of other genes with a set of ordinary differential equations (ODE) [11]; and iv) Bayesian models, or more generally graphical models, which adopt Bayes rules and consider gene expressions as random variables [12]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call