Abstract

In this paper a new method is proposed for solving the linear regression problem when the number of observations $n$ is smaller than the number of predictors v. This method uses the idea of graphical models and provides unbiased parameter estimates under certain conditions, while existing methods such as ridge regression, LASSO and least angle regression (LARS) give biased estimates. Also the new method can provide a detailed graphical correlation structure for the predictors, therefore the real causal relationship between predictors and response could be identified. In contrast, existing methods often cannot identify the real important predictors which have possible causal effects on the response variable. Unlike the existing methods based on graphical models, the proposed method can identify the potential networks while doing regression even if the data do not follow a multivariate distribution. The new method is compared with some existing methods such as ridge regression, LASSO and LARS by using simulated and real data sets. Our experiments reveal that the new method outperforms all the other methods when n<v.

Highlights

  • Consider a linear regression model with a univariate response, v covariates and n independent and identically distributed (i.i.d.) observations

  • When n < v, many methods have been proposed for the above models, such as Least Absolute Shrinkage and Selection Operator (LASSO) (Tibshirani, 1996), Least Angle Regression (LARS) (Efron et al, 2004) and ridge regression (Hoerl & Kennard, 1970)

  • We provide detailed simulation study to show that our GLSE has much smaller bias than other existing methods such as least absolute shrinkage and selection operator (LASSO), ridge regression and least angle regression (LARS)

Read more

Summary

Introduction

Consider a linear regression model with a univariate response, v covariates and n independent and identically distributed (i.i.d.) observations. The selected model based on LASSO and LARS can take at most n covariates (Zou & Hastie, 2005; McCann & Welsch, 2007) This will be problematic in some areas where more or even all covariates have to be included in the model. Ridge regression can include all covariates in the model, but the biased estimate makes it difficult to justify the significance levels for each covariate. This can lead to a non-sparse model which is difficult to interpret when the number of features is large (Yuan et al, 2007). Their estimates are still biased which might not be recommended in general (Washington et al, 2010; Zhang, 2010)

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.