Risk inflation of sequential tests controlled by alpha investing

Dean P Foster,Robert A Stine

doi:10.1080/00949655.2014.990454

Abstract

Streaming feature selection is a greedy approach to variable selection that evaluates potential explanatory variables sequentially. It selects significant features as soon as they are discovered rather than testing them all and picking the best one. Because it is so greedy, streaming selection can rapidly explore large collections of features. If significance is defined by an alpha investing protocol, then the rate of false discoveries will be controlled. The focus of attention in variable selection, however, should be on fit rather than hypothesis testing. Little is known, however, about the risk of estimators produced by streaming selection and how the configuration of these estimators influences the risk. To meet these needs, we provide a computational framework based on stochastic dynamic programming that allows fast calculation of the minimax risk of a sequential estimator relative to an alternative. The alternative can be data driven or derived from an oracle. This framework allows us to compute and contrast the risk inflation of sequential estimators derived from various alpha investing rules. We find that a universal investing rule performs well over a variety of models and that estimators allowed to have larger than conventional rates of false discoveries produce generally smaller risk.

Full Text