Local false discovery rate estimation with competition-based procedures for variable selection.

Xiaoya Sun,Yan Fu

doi:10.1002/sim.9942

Abstract

Multiple hypothesis testing has been widely applied to problems dealing with high-dimensional data, for example, the selection of important variables or features from a large number of candidates while controlling the error rate. The most prevailing measure of error rate used in multiple hypothesis testing is the false discovery rate (FDR). In recent years, the local false discovery rate (fdr) has drawn much attention, due to its advantage of accessing the confidence of individual hypotheses. However, most methods estimate fdr through -values or statistics with known null distributions, which are sometimes unavailable or unreliable. Adopting the innovative methodology of competition-based procedures, for example, the knockoff filter, this paper proposes a new approach, named TDfdr, to fdr estimation, which is free of -values or known null distributions. Extensive simulation studies demonstrate that TDfdr can accurately estimate the fdr with two competition-based procedures. We applied the TDfdr method to two real biomedical tasks. One is to identify significantly differentially expressed proteins related to the COVID-19 disease, and the other is to detect mutations in the genotypes of HIV-1 that are associated with drug resistance. Higher discovery power was observed compared to existing popular methods.

Full Text