Floating prioritized subset analysis: A powerful method to detect differentially expressed genes

Wan-Yu Lin,Wen-Chung Lee

doi:10.1016/j.csda.2010.07.023

Abstract

Controlling the false discovery rate (FDR) is a powerful approach to deal with a large number of hypothesis tests, such as in gene expression data analyses and genome-wide association studies. To further boost power, here we propose a floating prioritized subset analysis (floating PSA) that can more effectively use prior knowledge and detect more genes that are differentially expressed. Genes are first allocated into two subsets: a prioritized subset and a non-prioritized subset, according to investigators’ prior biological knowledge. We allow the FDRs of the two subsets to vary freely (to float) but aim to control the overall FDR at a desired level. An algorithm for the floating PSA is developed to detect the largest number of true positives. Theoretical justifications of the algorithm are given, and computer simulation studies show that the method has good statistical properties. We apply this method to detect genes that are differentially expressed between acute lymphoblastic leukemia and acute myeloid leukemia patients. The result shows that our floating PSA identifies 32 more genes (permutation-based FDR=0.0427) than the conventional (fixed) FDR control. Another example is a colon cancer study, and our floating PSA identifies 43 more genes (permutation-based FDR=0.0502). The floating PSA method is to be recommended for the detection of differentially expressed genes, in light of its power, robustness, and ease of implementation.

Full Text