Bias in the estimation of false discovery rate in microarray studies

Y Pawitan,K R K Murthy,S Michiels,A Ploner

doi:10.1093/bioinformatics/bti626

Abstract

The false discovery rate (FDR) provides a key statistical assessment for microarray studies. Its value depends on the proportion pi(0) of non-differentially expressed (non-DE) genes. In most microarray studies, many genes have small effects not easily separable from non-DE genes. As a result, current methods often overestimate pi(0) and FDR, leading to unnecessary loss of power in the overall analysis. For the common two-sample comparison we derive a natural mixture model of the test statistic and an explicit bias formula in the standard estimation of pi(0). We suggest an improved estimation of pi(0) based on the mixture model and describe a practical likelihood-based procedure for this purpose. The analysis shows that a large bias occurs when pi(0) is far from 1 and when the non-centrality parameters of the distribution of the test statistic are near zero. The theoretical result also explains substantial discrepancies between non-parametric and model-based estimates of pi(0). Simulation studies indicate mixture-model estimates are less biased than standard estimates. The method is applied to breast cancer and lymphoma data examples. An R-package OCplus containing functions to compute pi(0) based on the mixture model, the resulting FDR and other operating characteristics of microarray data, is freely available at http://www.meb.ki.se/~yudpaw yudi.pawitan@meb.ki.se and alexander.ploner@meb.ki.se.

Full Text