Abstract

We propose a convenient moment-based procedure for testing the omnibus null hypothesis of no contamination of a central chi-square distribution by a non-central chi-square distribution. In sharp contrast with likelihood ratio tests for mixture models, there is no need for re-sampling or random field theory to obtain critical values. Rather, critical values are available from an asymptotic normal distribution, and there is excellent agreement between nominal and actual significance levels. This procedure may be used to model numerous chi-square statistics, obtained via monotonic transformations of F statistics, from large-scale ANOVA testing, such as that encountered in microarray data analysis. In that context, modeling chi-square statistics instead of p-values may improve detection of differential gene expression, as we demonstrate through simulation studies, while also reducing false declarations of the same, as we illustrate in a case study on aging and cognition. Our procedure may also be incorporated into a gene filtration process, which may reduce type II errors on genewise null hypotheses by justifying lighter controls for Type I errors.

Highlights

  • Consider the mixture model [1,2,3], with probability density function(1-λ)χ2ν(0)+λ χ2ν(μ) (1)where 0 ≤ λ ≤ 1, χ2ν(0) denotes the central chi-square pdf on ν>0 degrees of freedom, and χ2ν(μ) denotes the chi-square pdf on ν df, with non-centrality parameter μ ≥ 0

  • Employing the Contaminated Chi-square (CCS) model to analyze chi-square statistics, instead of the Contaminated Beta (CB) model to assess p-values resolves the aforementioned concern, because the omnibus null hypothesis from (2) is not rejected for the genes eliminated in step 3

  • We have developed a convenient procedure for testing the omnibus null hypothesis of no contamination of a central chi-square distribution by a non-central chi-square distribution

Read more

Summary

Introduction

Where 0 ≤ λ ≤ 1, χ2ν(0) denotes the central chi-square pdf on ν>0 degrees of freedom (df), and χ2ν(μ) denotes the chi-square pdf on ν df, with non-centrality parameter μ ≥ 0. To understand how the CCS model and omnibus null hypothesis relate to large-scale ANOVA testing, suppose that a microarray experiment [4,5] is performed to measure expression levels on each of n genes for subjects in independent samples of sizes g1, g2, ..., gK from K populations. Letting λ denote the proportion of genes for which mean expression levels are not equal across the K populations, we may regard the collection of rescaled test statistics X1, X2, ..., Xn as a sample from the CCS model with ν=(K-1). If mean expression levels are equal across the K populations for all genes, the CCS model reduces to χ2K-1(0) This is why λμ=0 is referred to as the omnibus null hypothesis. An appendix explains the rescaling of F statistics into approximate chi-square statistics

Background on Mixture Modeling
Findings
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call