Abstract

BackgroundThe accurate detection of differentially expressed (DE) genes has become a central task in microarray analysis. Unfortunately, the noise level and experimental variability of microarrays can be limiting. While a number of existing methods partially overcome these limitations by incorporating biological knowledge in the form of gene groups, these methods sacrifice gene-level resolution. This loss of precision can be inappropriate, especially if the desired output is a ranked list of individual genes. To address this shortcoming, we developed M-BISON (Microarray-Based Integration of data SOurces using Networks), a formal probabilistic model that integrates background biological knowledge with microarray data to predict individual DE genes.ResultsM-BISON improves signal detection on a range of simulated data, particularly when using very noisy microarray data. We also applied the method to the task of predicting heat shock-related differentially expressed genes in S. cerevisiae, using an hsf1 mutant microarray dataset and conserved yeast DNA sequence motifs. Our results demonstrate that M-BISON improves the analysis quality and makes predictions that are easy to interpret in concert with incorporated knowledge. Specifically, M-BISON increases the AUC of DE gene prediction from .541 to .623 when compared to a method using only microarray data, and M-BISON outperforms a related method, GeneRank. Furthermore, by analyzing M-BISON predictions in the context of the background knowledge, we identified YHR124W as a potentially novel player in the yeast heat shock response.ConclusionThis work provides a solid foundation for the principled integration of imperfect biological knowledge with gene expression data and other high-throughput data sources.

Highlights

  • The accurate detection of differentially expressed (DE) genes has become a central task in microarray analysis

  • For each run at each parameter combination αt= [α i (NDE) α j (DE)]T, we used the resultant DE scores B*(αt) compared to the known truth to estimate an area under the curve (AUC) of the receiver operating characteristic

  • Using empirical M-BISON, we show a substantial increase in AUC and pAUC.2 when compared to performance using the B statistic with array data alone

Read more

Summary

Introduction

The accurate detection of differentially expressed (DE) genes has become a central task in microarray analysis. While a number of existing methods partially overcome these limitations by incorporating biological knowledge in the form of gene groups, these methods sacrifice gene-level resolution This loss of precision can be inappropriate, especially if the desired output is a ranked list of individual genes. Only a small number of genes are differentially expressed between two conditions; it becomes difficult to separate the biologically relevant genes from the vast majority of genes that are unchanged or whose changes are artifactual This stems in large part from the inherent noisiness and often poor reproducibility of the microarray assay, especially with respect to genes expressed at low levels [2,3]. Our knowledge of these networks is incomplete, it is attractive to have methods that use this information

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call