Abstract

We consider the problem of robustifying high-dimensional structured estimation. Robust techniques are key in real-world applications which often involve outliers and data corruption. We focus on trimmed versions of structurally regularized M-estimators in the high-dimensional setting, including the popular Least Trimmed Squares estimator, as well as analogous estimators for generalized linear models and graphical models, using possibly non-convex loss functions. We present a general analysis of their statistical convergence rates and consistency, and then take a closer look at the trimmed versions of the Lasso and Graphical Lasso estimators as special cases. On the optimization side, we show how to extend algorithms for M-estimators to fit trimmed variants and provide guarantees on their numerical convergence. The generality and competitive performance of high-dimensional trimmed estimators are illustrated numerically on both simulated and real-world genomics data.

Highlights

  • We consider the problem of high-dimensional estimation, where the number of variables p may greatly exceed the number of observations n

  • For matrix-structured regression problems, estimators using nuclear-norm regularization have been studied e.g. by Recht et al (2010). Another prime example is that of sparse inverse covariance estimation for graphical model selection (Ravikumar et al 2011)

  • We focus on Gaussian graphical models and provide the statistical guarantees of our Trimmed Graphical Lasso estimator as presented in Section 2 (Motivating Example 2)

Read more

Summary

Introduction

We consider the problem of high-dimensional estimation, where the number of variables p may greatly exceed the number of observations n. The development and the statistical analysis of structurally constrained estimators for high-dimensional estimation has recently attracted considerable attention These estimators seek to minimize the sum of a loss function and a weighted regularizer. The desirable theoretical properties of such regularized M-estimators can be compromised, since outliers and corruptions are often present in high-dimensional data problems These challenges motivate the development of robust structured learning methods that can cope with observations deviating from the model assumptions. The median of least squares residual originally proposed by Rousseeuw (1984) avoids this problem, reaching breakdown point of nearly 50%; the approach is equivalent to ‘trimming’ a portion of the largest residuals This lead to the consideration of sparse Least Trimmed Squares (sparse LTS) for robust high-dimensional estimation.

A General Framework for High-Dimensional Trimmed Estimators
Statistical Guarantees of Trimmed Estimators
Statistical Guarantees of High-Dimensional Least Trimmed Squares
Statistical Guarantees of Trimmed Graphical Lasso
Optimization for Trimmed Estimators
Simulated Data Experiments
Simulations for Sparse Logistic Regression
Simulations for Trace-Norm Regularized Regression
Simulations for Gaussian Graphical Models
Analysis of Yeast Genotype and Expression data
Method
Application to the analysis of Yeast Gene Expression Data
Concluding Remarks
A Proof of Theorem 1
C Results for Trimmed Graphical Lasso
Proof of Corollary 3
Proof of Corollary 5
D Proof of Proposition 1
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call