Abstract

Many regularization schemes for high-dimensional regression have been put forward. Most require the choice of a tuning parameter, using model selection criteria or cross-validation. We show that a simple sign-constrained least squares estimation is a very simple and effective regularization technique for a certain class of high-dimensional regression problems. The sign constraint has to be derived via prior knowledge or an initial estimator. The success depends on conditions that are easy to check in practice. A sufficient condition for our results is that most variables with the same sign constraint are positively correlated. For a sparse optimal predictor, a non-asymptotic bound on the $\ell_{1}$-error of the regression coefficients is then proven. Without using any further regularization, the regression vector can be estimated consistently as long as $s^{2}\log(p)/n\rightarrow 0$ for $n\rightarrow\infty$, where $s$ is the sparsity of the optimal regression vector, $p$ the number of variables and $n$ sample size. The bounds are almost as tight as similar bounds for the Lasso for strongly correlated design despite the fact that the method does not have a tuning parameter and does not require cross-validation. Network tomography is shown to be an application where the necessary conditions for success of sign-constrained least squares are naturally fulfilled and empirical results confirm the effectiveness of the sign constraint for sparse recovery if predictor variables are strongly correlated.

Highlights

  • High-dimensional regression problems are characterized by a large number of predictor variables in relation to sample size

  • We study the performance of non-negative least squares type problems under a so-called Positive Eigenvalue Condition, which can be checked for any given dataset by solving a quadratic programming problem

  • The results above imply that non-negative least squares (NNLS) can be very effective if (a) the sign of regression coefficients is known or can be estimated and (b) the Positive Eigenvalue Condition holds

Read more

Summary

Introduction

High-dimensional regression problems are characterized by a large number of predictor variables in relation to sample size. Using the same Positive Eigenvalue Condition (which is called self-regularizing design condition), a bound on the prediction error of NNLS and a sparse recovery property after hard thresholding are shown in Slawski et al (2011). Imposing a sign-constraint might seem like a very weak regularization but it will be shown that the estimator is remarkably different from the un-regularized least squares estimator It can cope with high-dimensional problems, where the number of predictor variables vastly exceeds sample size. It will be shown to be a consistent estimator as long as the underlying optimal prediction is sufficiently sparse (ie using only a small subset of all predictor variables) and the so-called Positive Eigenvalue Condition is fulfilled. A 1-bound on the difference between the NNLS estimator and the optimal regression coefficients, is shown in Section 3, along with a bound on the prediction error

Notation and Assumptions
Compatibility Condition
Positive eigenvalue condition
Main Results
Numerical Results
Discussion
Proof of Theorem 2
Lemmata
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.