A Look at Multiplicity Through Misclassification

Nairanjana Dasgupta,Nicole A Lazar,Alan Genz

doi:10.1007/s13571-015-0110-6

Abstract

Multiplicity in large scale studies using, for example, microarray genomic data and functional neuroimaging data, has been an extensively researched topic in recent years. One option often used by researchers in practice is a “top r-table”, which involves ranking the hypotheses in some order (p-values or test statistics) and reporting the top r results. This has immediate practical applications as what we have is a list of “interesting” results that are worth following up, irrespective of the actual p-value (adjusted or not). In this manuscript we take another look at multiplicity using top-tables. Our approach is intended to be a compromise between theory and practice. We look at the relationship between the probability of correct classification, which we call r-power (the units picked in the top-r table do indeed come from the alternative), and the value of r. We analytically define r-power in terms of order statistics and quantify the probability of correct classification. We use numerical integration to calculate r-power as a function of effect size, δ; the number of hypotheses tested, N; the number of hypotheses coming from the null, k; and r. Our results indicate that r-power is positively related to effect size, and negatively related to k/N. The relationship to r depends upon whether r<k. There are two possible uses of our results: based on a pre-chosen r-power we can calculate r and decide on the number of hypotheses to be followed up or if r is calculated using some other criterion we can use our method to calculate r-power in that context. We illustrate these ideas using examples from microarrays and functional magnetic resonance imaging data.

Full Text