Abstract

BackgroundNovel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities.ResultsWe propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database) is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods.ConclusionsREGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear regressions to separate areas of the search space favoring to infer localized similarities over a more global similarity. Furthermore, experimental results show the good performance of REGNET.

Highlights

  • Novel strategies are required in order to handle the huge amount of data produced by microarray technologies

  • Our method provides as genegene interactions all those significant dependencies in a statistical sense

  • The task of learning Bayesian Networks (BN) from data is NP-hard with respect to the number of network vertices, i.e. Bayesian methods are computationally intractable for a huge number of genes

Read more

Summary

Introduction

Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks They are typically generated using correlation statistics as pairwise similarity measures. Interactions are inferred using correlation statistics as pairwise similarity measures between gene expression profiles over multiple conditions, as for example in [3] In this kind of methods, if the correlation between gene pairs is higher than a threshold value, it is considered that these gene pairs interact directly in a signaling pathway and are relevant in a biological way [4,5,6]. The causal properties that can be inferred from correlations have been investigated in [9,10]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call