Abstract
Feature selection, reproducibility, and model selection are of fundamental importance in contemporary statistics. Feature selection methods are required in a wide range of applications in order to evaluate the significance of covariates. Meanwhile, reproducibility of selected features is needed to claim that findings are meaningful and interpretable. Finally, model selection is employed for pinpointing the best set of covariates among a sequence of candidate models produced by feature selection methods. ❧ We show that p-values, a common tool for feature selection, behave differently in nonlinear models and p-values in nonlinear models can break down earlier than their linear counterparts. Next, we provide important theoretical foundations of model-X knockoffs which is a recent state-of-the-art method for reproducibility. We establish the power and robustness results for model-X knockoffs. Finally, we tackle large-scale model selection problem for misspecified models. We propose a novel information criterion which is tailored for both model misspecification and high dimensionality.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have