New Facts in Regression Estimation under Conditions of Multicollinearity

Anatoly Gordinsky

doi:10.4236/ojs.2016.65070

Abstract

This paper considers the approaches and methods for reducing the influence of multi-collinearity. Great attention is paid to the question of using shrinkage estimators for this purpose. Two classes of regression models are investigated, the first of which corresponds to systems with a negative feedback, while the second class presents systems without the feedback. In the first case the use of shrinkage estimators, especially the Principal Component estimator, is inappropriate but is possible in the second case with the right choice of the regularization parameter or of the number of principal components included in the regression model. This fact is substantiated by the study of the distribution of the random variable , where b is the LS estimate and β is the true coefficient, since the form of this distribution is the basic characteristic of the specified classes. For this study, a regression approximation of the distribution of the event based on the Edgeworth series was developed. Also, alternative approaches are examined to resolve the multicollinearity issue, including an application of the known Inequality Constrained Least Squares method and the Dual estimator method proposed by the author. It is shown that with a priori information the Euclidean distance between the estimates and the true coefficients can be significantly reduced.

Highlights

IntroductionThe term “multicollinearity” is almost as popular as the term “regression”
In the statistical literature, the term “multicollinearity” is almost as popular as the term “regression”
The article has analyzed the empirical approaches and the statistical methods which facilitate reducing the influence of multicollinearity on the estimation of coefficients in linear regression

Summary

Introduction

The term “multicollinearity” is almost as popular as the term “regression”. This is natural since the regression analysis is one of the powerful tools to reveal dependencies which are hidden in the empirical data while multicolli-. A high correlation between two or more of the explanatory variables (predictors) sharply increases the variance of estimators, which adversely affects the study of a degree and a direction of the predictor action on the response variable. Multicollinearity impairs the predictive opportunities of the regression equation when the correlation in the new data significantly differs from the one in the training set. That is why the estimation of regression parameters under multicollinearity still remains one of the priorities of an applied and theoretical statistics

Objectives

Methods

Results