Abstract

In the statistical analysis of data, a model might be awfully fitted with the presence of outliers. Besides, it has been well established to use residuals for identification of outliers. The asymptotic properties of residuals can be utilized to contribute diagnostic tools. However, it is now evident that most of the existing diagnostic methods have failed in identifying multiple outliers. Therefore, this paper proposed a diagnostic method for the identification of multiple outliers in GLM, where traditionally used outlier detection methods are effortless as they undergo masking or swamping dilemma. Hence, an investigation was carried out to determine the capability of the proposed GSCPR method. The findings obtained from the numerical examples indicated that the performance of the proposed method was satisfactory for the identification of multiple outliers. Meanwhile, in the simulation study, two scenarios were considered to assess the validity of the proposed method. The proposed method consistently displayed higher percentage of correct detection, as well as lower rates of swamping and masking, regardless of the sample size and the contamination levels.

Highlights

  • Generalized linear model (GLM) is a continuation of the familiar linear regression model for modeling a nonnormal response variable [1]

  • Cordeiro and McCullagh [5] derived the formulae for first-order biases of maximum likelihood estimates of linear parameters, linear predictors, dispersion parameter, and fitted values in GLM

  • Cordeiro and Simas [7] obtained an explicit formula for the density of the Pearson residuals to order n−1, which hold for all continuous GLM and defined corrected residuals for these models

Read more

Summary

Introduction

Generalized linear model (GLM) is a continuation of the familiar linear regression model for modeling a nonnormal response variable [1]. The reasons are that a group of outliers is able to distort the fitting of a model as the outliers can have artificially tiny residuals that appear as inliers [9] This type of troublesomeness is cognized as the masking effect. Imon and Hadi [9] proposed a generalized version of standardized Pearson residuals based on group deletion method (GSPR) to overcome the difficulty of multiple outliers detection in logistic regression. These have motivated the researchers to modify the corrected Pearson residuals [7] to adapt to the problems related to multiple outliers.

Residuals in GLM
Identification of Multiple Outliers by Using GSCPR
Example Using Real Data Set
Simulation Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call