Abstract

BackgroundThe detection of small yet statistically significant differences in gene expression in spotted DNA microarray studies is an ongoing challenge. Meeting this challenge requires careful examination of the performance of a range of statistical models, as well as empirical examination of the effect of replication on the power to resolve these differences.ResultsNew models are derived and software is developed for the analysis of microarray ratio data. These models incorporate multiplicative small error terms, and error standard deviations that are proportional to expression level. The fastest and most powerful method incorporates additive small error terms and error standard deviations proportional to expression level. Data from four studies are profiled for the degree to which they reveal statistically significant differences in gene expression. The gene expression level at which there is an empirical 50% probability of a significant call is presented as a summary statistic for the power to detect small differences in gene expression.ConclusionsUnderstanding the resolution of difference in gene expression that is detectable as significant is a vital component of experimental design and evaluation. These small differences in gene expression level are readily detected with a Bayesian analysis of gene expression level that has additive error terms and constrains samples to have a common error coefficient of variation. The power to detect small differences in a study may then be determined by logistic regression.

Highlights

  • The detection of small yet statistically significant differences in gene expression in spotted DNA microarray studies is an ongoing challenge

  • The Bayesian Information Criterion for model selection can be used to choose between models that invoke distinct error variances or coefficients of variation for each node as characterized by genotype, environment, and developmental state, and the nested models that invoke a single variance or coefficient of variation (CV) for all nodes

  • For unconstrained models that estimate a variance or CV term for each node, the analysis model incorporating an assumption of small multiplicative error terms had greater power to detect differences than the analysis model incorporating an assumption of small additive error terms

Read more

Summary

Results

The general and nested models were implemented on two independent published data sets large enough to estimate parameters within the general model [10,26]. In the analysis of data simulated as a ratio of two normal distributions, model AC exhibited the greatest power to detect true differences in gene expression (Figure 1). In the analysis of data simulated as a ratio of two lognormal distributions, model AC again exhibited the greatest power to detect true differences in gene expression (Figure 2). In the analysis of ratio data simulated from Cauchy and Gamma distributions, model AC again was found to exhibit the greatest power to detect true differences in gene expression (Figure 3). Logistic regressions of the frequency of affirmative significance call on the estimated log factor of difference in gene expression for five datasets from four published studies that publicly reported replicated ratio results for each hybridization. Across all of these experiments, increased replication yielded greater resolution of the statistical significance of small differences in gene expression

Conclusions
Background
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call