Two-Step Robust Diagnostic Method for Identification of Multiple High Leverage Points

Bagheri Bagheri

doi:10.3844/jmssp.2009.97.106

Abstract

Problem statement: High leverage points are extreme outliers in the X-direction. In regression analysis, the detection of these leverage points becomes important due to their arbitrary large effects on the estimations as well as multicollinearity problems. Mahalanobis Distance (MD) has been used as a diagnostic tool for identification of outliers in multivariate analysis where it finds the distance between normal and abnormal groups of the data. Since the computation of MD relies on non-robust classical estimations, the classical MD can hardly detect outliers accurately. As an alternative, Robust MD (RMD) methods such as Minimum Covariance Determinant (MCD) and Minimum Volume Ellipsoid (MVE) estimators had been used to identify the existence of high leverage points in the data set. However, these methods tended to swamp some low leverage points even though they can identify high leverage points correctly. Since, the detection of leverage points is one of the most important issues in regression analysis, it is imperative to introduce a novel detection method of high leverage points. Approach: In this study, we proposed a relatively new two-step method for detection of high leverage points by utilizing the RMD (MVE) and RMD (MCD) in the first step to identify the suspected outlier points. Then, in the second step the MD was used based on the mean and covariance of the clean data set. We called this method two-step Robust Diagnostic Mahalanobis Distance (RDMDTS) which could identify high leverage points correctly and also swamps less low leverage points. Results: The merit of the newly proposed method was investigated extensively by real data sets and Monte Carlo Simulations study. The results of this study indicated that, for small sample sizes, the best detection method is (RDMDTS) (MVE)-mad while there was not much difference between (RDMDTS) (MVE)-mad and (RDMDTS) (MCD)-mad for large sample sizes. Conclusion/Recommendations: In order to swamp less low leverage as high leverage point, the proposed robust diagnostic methods, (RDMDTS) (MVE)-mad and (RDMDTS) (MCD)-mad were recommended.

Highlights

Outliers are observations which break the pattern shown by the majority of the data set
In order to improve DRGP (MVE) performance proposed by[6], we follow the idea of Rousseeuw and Leroy[23] in developing robust multivariate estimators and propose a relatively new method for high leverage points identification which is called two-steps Robust Diagnostic Mahalanobis Distance (RDMDTS)
Two different cutoff points are considered, namely the χ 2 k, 0.975 where k is the number of explanatory variables and a new proposed one, that is Median (RDMDTS) +c Mad (RDMDTS)

Summary

Introduction

Outliers are observations which break the pattern shown by the majority of the data set They can be classified in the following categories: (1) Good leverage points: Observations which follow the same regression line as the other data in the data set they fall far from the majority of the explanatory variables (2) Bad leverage points: Observations deviate from the same regression line as the other data in the data set and fall far from the majority of explanatory variables, (3) Vertical Outliers or high y residual outliers: Observations which are not leverage points but have high response variables residuals[19]. Rousseeuw and Van Zomeren[25] pointed out that high leverages can affect the estimated slope of the regression line in Ordinary Least Squares (OLS), may cause more serious problems than other outliers which might only affect the estimated intercept term Their presence in regression models may make some low leverage as high leverage and vice versa.

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Mathematics and Statistics	Publication Date: Feb 1, 2009
Citations: 27	License type: cc-by

R Discovery Prime

R Discovery Prime

Two-Step Robust Diagnostic Method for Identification of Multiple High Leverage Points

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Mathematics and Statistics

Lead the way for us

Similar Papers

The performance of diagnostic-robust generalized potentials for the identification of multiple high leverage points in linear regression
M Habshah ... A H.M Rahmatullah Imon
Journal of Applied Statistics | VOL. 36
M Habshah, et. al.M Habshah ... A H.M Rahmatullah Imon
01 May 2009
Journal of Applied Statistics | VOL. 36

Robust logistic regression in the presence of high leverage points
Mohammed A Mohammed
Journal of Al-Qadisiyah for Computer Science and Mathematics | VOL. 11
Mohammed A MohammedMohammed A Mohammed
04 Sep 2019
Journal of Al-Qadisiyah for Computer Science and Mathematics | VOL. 11

Robust Logistic Diagnostic for the Identification of High Leverage Points in Logistic Regression Model
B.A Syaiba ... M Habshah
Journal of Applied Sciences | VOL. 10
B.A Syaiba, et. al.B.A Syaiba ... M Habshah
15 Nov 2010
Journal of Applied Sciences | VOL. 10

Small - sample correction factor of the minimum covariance determinant estimator
Eduardo Castaño-Tostado
Communications in Statistics - Simulation and Computation | VOL. 29
Eduardo Castaño-TostadoEduardo Castaño-Tostado
01 Jan 1999
Communications in Statistics - Simulation and Computation | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Two-Step Robust Diagnostic Method for Identification of Multiple High Leverage Points

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Mathematics and Statistics