Variable selection with false discovery rate control in deep neural networks

Zixuan Song,Jun Li

doi:10.1038/s42256-021-00308-z

Zixuan Song, Jun Li

Open Access

PDF Available

https://doi.org/10.1038/s42256-021-00308-z

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Deep neural networks are famous for their high prediction accuracy, but they are also known for their black-box nature and poor interpretability. We consider the problem of variable selection, that is, selecting the input variables that have significant predictive power on the output, in deep neural networks. Most existing variable selection methods for neural networks are only applicable to shallow networks or are computationally infeasible on large datasets; moreover, they lack a control on the quality of selected variables. Here we propose a backward elimination procedure called SurvNet, which is based on a new measure of variable importance that applies to a wide variety of networks. More importantly, SurvNet is able to estimate and control the false discovery rate of selected variables empirically. Further, SurvNet adaptively determines how many variables to eliminate at each step in order to maximize the selection efficiency. The validity and efficiency of SurvNet are shown on various simulated and real datasets, and its performance is compared with other methods. Especially, a systematic comparison with knockoff-based methods shows that although they have more rigorous false discovery rate control on data with strong variable correlation, SurvNet usually has higher power. Identifying salient input features can be a challenge in neural networks. The authors developed a variable selection procedure with false discovery rate control that works on classification or regression problems, one or multiple output neurons, and deep or shallow neural networks.

Full Text