Abstract

Generalized linear models (GLM) and generalized additive models (GAM) are popular statistical methods for modelling continuous and discrete data both parametrically and nonparametrically. In this general framework, we consider the problem of variable selection by studying a wide class of penalized M-estimators that are particularly well suited for high dimensional scenarios where the number of covariates $p$ is very large relative to the sample size $n$. We focus on resistance issues in the presence of deviations from the stochastic assumptions of the postulated models and highlight the weaknesses of widely used estimators. We advocate the need for robust estimators and propose several penalized quasilikelihood estimators that achieve both good statistical properties at the assumed model and stability in a neighborhood of it. Specifically, we provide careful asymptotic analyses of our robust estimators for GLM and GAM when the number of parameters increases with the sample size. We start by revisiting the asymptotics of M-estimators for GLM with a diverging number of parameters. We establish asymptotic normality of these estimators and reexamine distributional results for likelihood ratio type and Wald type tests based on them. We then consider penalized M-estimators for high dimensional set ups where $pgg n$. In the GLM setting we show that our estimators are consistent, asymptotically normally distributed and variable selection consistent under regularity conditions. Furthermore they have a bounded bias in a neighborhood of the model. In the GAM setting we establish an $ell_2$-norm consistency result for the nonparametric components which achieves the optimal rates of convergence. In addition, the proposed penalized estimator is able to select the correct model consistently. We propose new algorithms for the implementation of our penalized M-estimators and illustrate the finite sample performance of our methods, at the model and under contamination, in simulation studies. An important contribution of this thesis is to formally study the local robustness properties of general nondifferentiable penalized M-estimators. In particular, we propose a framework that allows us to define rigorously the influence function as the limiting influence function of a sequence of approximating functionals. We show that this influence function can be used to characterize the robustness properties of a wide range of sparse estimators and that it can be viewed as a derivative in the sense of distribution theory. At the end of this thesis, we discuss some extensions of our work and give an overview of the future challenges of robust statistics in high dimensions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.