Abstract
We consider the problem of robustifying high-dimensional structured estimation. Robust techniques are key in real-world applications which often involve outliers and data corruption. We focus on trimmed versions of structurally regularized M-estimators in the high-dimensional setting, including the popular Least Trimmed Squares estimator, as well as analogous estimators for generalized linear models and graphical models, using possibly non-convex loss functions. We present a general analysis of their statistical convergence rates and consistency, and then take a closer look at the trimmed versions of the Lasso and Graphical Lasso estimators as special cases. On the optimization side, we show how to extend algorithms for M-estimators to fit trimmed variants and provide guarantees on their numerical convergence. The generality and competitive performance of high-dimensional trimmed estimators are illustrated numerically on both simulated and real-world genomics data.
Highlights
We consider the problem of high-dimensional estimation, where the number of variables p may greatly exceed the number of observations n
For matrix-structured regression problems, estimators using nuclear-norm regularization have been studied e.g. by Recht et al (2010). Another prime example is that of sparse inverse covariance estimation for graphical model selection (Ravikumar et al 2011)
We focus on Gaussian graphical models and provide the statistical guarantees of our Trimmed Graphical Lasso estimator as presented in Section 2 (Motivating Example 2)
Summary
We consider the problem of high-dimensional estimation, where the number of variables p may greatly exceed the number of observations n. The development and the statistical analysis of structurally constrained estimators for high-dimensional estimation has recently attracted considerable attention These estimators seek to minimize the sum of a loss function and a weighted regularizer. The desirable theoretical properties of such regularized M-estimators can be compromised, since outliers and corruptions are often present in high-dimensional data problems These challenges motivate the development of robust structured learning methods that can cope with observations deviating from the model assumptions. The median of least squares residual originally proposed by Rousseeuw (1984) avoids this problem, reaching breakdown point of nearly 50%; the approach is equivalent to ‘trimming’ a portion of the largest residuals This lead to the consideration of sparse Least Trimmed Squares (sparse LTS) for robust high-dimensional estimation.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have