Analyzing data from large-scale, multi-experiment studies requires scientists to both analyze each experiment and to assess the results as a whole. In this article, we develop double empirical Bayes testing (DEBT), an empirical Bayes method for analyzing multi-experiment studies when many covariates are gathered per experiment. DEBT is a two-stage method: in the first stage, it reports which experiments yielded significant outcomes; in the second stage, it hypothesizes which covariates drive the experimental significance. In both of its stages, DEBT builds on Efron (2008), which lays out an elegant empirical Bayes approach to testing. DEBT enhances this framework by learning a series of black box predictive models to boost power and control the false discovery rate (FDR). In Stage 1, it uses a deep neural network prior to report which experiments yielded significant outcomes. In Stage 2, it uses an empirical Bayes version of the knockoff filter (Candes et al., 2018) to select covariates that have significant predictive power of Stage-1 significance. In both simulated and real data, DEBT increases the proportion of discovered significant outcomes and selects more features when signals are weak. In a real study of cancer cell lines, DEBT selects a robust set of biologically-plausible genomic drivers of drug sensitivity and resistance in cancer.
Read full abstract