Abstract

Currently, statistical techniques for analysis of microarray-generated data sets have deficiencies due to limited understanding of errors inherent in the data. A generalized likelihood ratio (GLR) test based on an error model has been recently proposed to identify differentially expressed genes from microarray experiments. However, the use of different error structures under the GLR test has not been evaluated, nor has this method been compared to commonly used statistical tests such as the parametric t-test. The concomitant effects of varying data signal-to-noise ratio and replication number on the performance of statistical tests also remain largely unexplored. In this study, we compared the effects of different underlying statistical error structures on the GLR test's power in identifying differentially expressed genes in microarray data. We evaluated such variants of the GLR test as well as the one sample t-test based on simulated data by means of receiver operating characteristic (ROC) curves. Further, we used bootstrapping of ROC curves to assess statistical significance of differences between the areas under the curves. Our results showed that i) the GLR tests outperformed the t-test for detecting differential gene expression, ii) the identity of the underlying error structure was important in determining the GLR tests' performance, and iii) signal-to-noise ratio was a more important contributor than sample replication in identifying statistically significant differential gene expression.

Highlights

  • The development of microarray technology has been phenomenal in the past decade, with transcriptional profiling a standard tool in many genomics research laboratories (Ginsberg and Mirnics, 2006; Rosa et al 2006)

  • In GLR3, we explicitly modeled the observation that correlation between expression levels under control and treatment conditions decreases at lower intensities by weighting the multiplicative error term using the reciprocals of the mean intensity, as shown in (5)

  • The overall patterns of the receiver operating characteristic (ROC) curves with respect to varying signal-to-noise ratio and number of replications per gene were similar among the statistical tests

Read more

Summary

Introduction

The development of microarray technology has been phenomenal in the past decade, with transcriptional profiling a standard tool in many genomics research laboratories (Ginsberg and Mirnics, 2006; Rosa et al 2006). A popular method to detect difference in gene expression has been the use of fold-change cutoffs (Chattopadhyay et al 2007; Shimada et al 2007) This approach seeks genes whose expression intensities change, for example, by a factor of two or more between control and treatment samples. The fixed threshold cutoff method is not based on specific data modeling assumptions and is statistically inefficient because it cannot account for the numerous systemic and biological variations inherent in a microarray experiment (Jaluria et al 2007). Another commonly used method is the traditional parametric t-test. The performance of the t-test depends on the sample size, and whether the expression intensities can be assumed as normally distributed (Riva et al 2005)

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call