Abstract

In this paper, the problem of identifying differentially expressed genes under different conditions using gene expression microarray data, in the presence of outliers, is discussed. For this purpose, the robust modeling of gene expression data using some powerful distributions known as normal/independent distributions is considered. These distributions include the Student’s t and normal distributions which have been used previously, but also include extensions such as the slash, the contaminated normal and the Laplace distributions. The purpose of this paper is to identify differentially expressed genes by considering these distributional assumptions instead of the normal distribution. A Bayesian approach using the Markov Chain Monte Carlo method is adopted for parameter estimation. Two publicly available gene expression data sets are analyzed using the proposed approach. The use of the robust models for detecting differentially expressed genes is investigated. This investigation shows that the choice of model for differentiating gene expression data is very important. This is due to the small number of replicates for each gene and the existence of outlying data. Comparison of the performance of these models is made using different statistical criteria and the ROC curve. The method is illustrated using some simulation studies. We demonstrate the flexibility of these robust models in identifying differentially expressed genes.

Highlights

  • Microarrays allow the simultaneous measurement of the expression levels of thousands of genes

  • The results show that, under 0.44276 for false discovery rate (FDR), 2739 genes are detected as being differentially expressed

  • This shows that different criteria, Bayesian false discovery rate (bFDR), Bayesian true negative rate (bTNR) and Bayesian false negative rate (bFNR), for each κ value, have nearly the same number of differentially expressed genes

Read more

Summary

Introduction

Microarrays allow the simultaneous measurement of the expression levels of thousands of genes. [14] introduced a Laplace mixture model as a long-tailed alternative to the normal distribution when identifying differentially expressed genes in microarray experiments. This model permits greater flexibility than models in current use as it has the potential, at least with sufficient data, to accommodate both whole genome and restricted coverage arrays. An extension of the Bayesian hierarchical model of [2, 13, 18] is proposed using the family of normal/independent (N/I) distributions for errors to achieve some more robust models for analyzing gene expression microarray data. More details of members of the normal/independent (N/I) distributions and an analysis of Bayesian false discovery rate are given in appendices A and B, respectively

Golub data
The hereditary breast cancer data
Two-group case
Multiple-group case
The Golub data
The BRCA data
Simulation Studies
Method ik f
Real ik ÞIiMk ethod
Simulation study 1
Simulation study 2
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.