Abstract

Feature selection, reproducibility, and model selection are of fundamental importance in contemporary statistics. Feature selection methods are required in a wide range of applications in order to evaluate the significance of covariates. Meanwhile, reproducibility of selected features is needed to claim that findings are meaningful and interpretable. Finally, model selection is employed for pinpointing the best set of covariates among a sequence of candidate models produced by feature selection methods. ❧ We show that p-values, a common tool for feature selection, behave differently in nonlinear models and p-values in nonlinear models can break down earlier than their linear counterparts. Next, we provide important theoretical foundations of model-X knockoffs which is a recent state-of-the-art method for reproducibility. We establish the power and robustness results for model-X knockoffs. Finally, we tackle large-scale model selection problem for misspecified models. We propose a novel information criterion which is tailored for both model misspecification and high dimensionality.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call