A Tight Prediction Interval for False Discovery Proportion under Dependence

Shulian Shang,Mengling Liu,Yongzhao Shao

doi:10.4236/ojs.2012.22018

Abstract

The false discovery proportion (FDP) is a useful measure of abundance of false positives when a large number of hypotheses are being tested simultaneously. Methods for controlling the expected value of the FDP, namely the false discovery rate (FDR), have become widely used. It is highly desired to have an accurate prediction interval for the FDP in such applications. Some degree of dependence among test statistics exists in almost all applications involving multiple testing. Methods for constructing tight prediction intervals for the FDP that take account of dependence among test statistics are of great practical importance. This paper derives a formula for the variance of the FDP and uses it to obtain an upper prediction interval for the FDP, under some semi-parametric assumptions on dependence among test statistics. Simulation studies indicate that the proposed formula-based prediction interval has good coverage probability under commonly assumed weak dependence. The prediction interval is generally more accurate than those obtained from existing methods. In addition, a permutation-based upper prediction interval for the FDP is provided, which can be useful when dependence is strong and the number of tests is not too large. The proposed prediction intervals are illustrated using a prostate cancer dataset.

Highlights

When a large number of hypotheses are tested simultaneously, a direct measure of the abundance of false positive findings is the false discovery proportion (FDP), defined as FDP, or Q V R 1, whereR denotes the total number of rejections, V denotes the number of rejections of true null hypotheses, and R 1 max R,1
Methods for constructing tight prediction intervals for the FDP that take account of dependence among test statistics are of great practical importance
This paper derives a formula for the variance of the FDP and uses it to obtain an upper prediction interval for the FDP, under some semi-parametric assumptions on dependence among test statistics

Summary

Introduction

Suppose a study is properly designed to control the FDR at 5%. If such a study is independently repeated many times, the average of the FDPs in these repeated studies can be expected to be no more than 5%. When a study is designed to control FDR under common designs, it is still very much desirable to assess FDP, e.g. to construct a prediction interval for the FDP. One can consider designing a study controlling FDP instead of FDR. Confidence envelopes from the existing FDP controlling procedures are often too conservative for predicting a tight range for the FDP. When weak correlations exist among test statistics, methods for constructing tight prediction interval for the FDP are still limited

Objectives

Methods

Conclusion