Abstract

BackgroundAnalysis of microarray data has been used for the inference of gene-gene interactions. If, however, the aim is the discovery of disease-related biological mechanisms, then the criterion for defining such interactions must be specifically linked to disease.ResultsHere we present a computational methodology that jointly analyzes two sets of microarray data, one in the presence and one in the absence of a disease, identifying gene pairs whose correlation with disease is due to cooperative, rather than independent, contributions of genes, using the recently developed information theoretic measure of synergy. High levels of synergy in gene pairs indicates possible membership of the two genes in a shared pathway and leads to a graphical representation of inferred gene-gene interactions associated with disease, in the form of a "synergy network." We apply this technique on a set of publicly available prostate cancer expression data and successfully validate our results, confirming that they cannot be due to pure chance and providing a biological explanation for gene pairs with exceptionally high synergy.ConclusionThus, synergy networks provide a computational methodology helpful for deriving "disease interactomes" from biological data. When coupled with additional biological knowledge, they can also be helpful for deciphering biological mechanisms responsible for disease.

Highlights

  • Analysis of microarray data has been used for the inference of gene-gene interactions

  • The amount of information about cancer that is due to the purely cooperative effects among all the members of a gene set can be quantified using information theoretic tools [15,18], the synergy of a gene pair with respect to cancer previously defined as I(G1, G2; C) - [I(G1; C) + I(G2; C)]

  • Validation with independent gene expression dataset To confirm that our results are applicable when used on independently obtained samples, we used a prostate cancer gene expression dataset containing values for 25 malignant and 8 healthy samples from a different laboratory [26], to which we refer as the "validation dataset." We found that direct numerical evaluation of synergy from the validation dataset is not meaningful, because the P value for even the top-ranked gene pair is 0.10 (Additional File 2), indicating that results are not statistically significant

Read more

Summary

Introduction

Analysis of microarray data has been used for the inference of gene-gene interactions. To solve this problem, we may wish to apply a traditional gene interaction network inference methodology, such as Bayesian network inference, on each of the two microarray data sets, for example one representing healthy samples (tissues) and another representing cancerous samples, and compare the two resulting networks (the "normal" network and the one that has been "rewired" due to the disease) in an effort to identify differences in gene membership and network topology that may be related to the phenotype. Constructing the topology of network graphs often requires the use of heuristic or greedy algorithms that are sensitive to the number of biological samples in each of the two sets of microarray data, as well as noise in the expression data It becomes unclear how the differences in the two networks will identify gene interactions that are linked to disease. This methodology provides insight that existing methods cannot provide

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call