ABSTRACT Statistical inference for high-dimensional survival data is important for obtaining valid scientific results in many research areas, including biomedical studies and financial risk management. In this paper, a novel framework for feature selection in Cox model is proposed, which achieves false discovery rate (FDR) control asymptotically. The key step is to construct a sequence of ranking statistics based on two independent estimators of the regression coefficients. The FDR control is reached by choosing a data-driven threshold along the ranking of symmetric-based statistics. The de-sparsified estimator and uneven data splitting strategy are employed to improve the robustness of variable selection results and the power in finite sample analysis. We establish the asymptotic FDR control property for the proposed approach at any designated level. Extensive simulation studies and an empirical application on a P2P loan dataset confirm the robustness of the proposed method in FDR control, and show that it often leads to higher power among competitors.
Read full abstract