Detection and Diagnostic Methods of Multiple Influential Points in Binary Logistic Regression Model in Animal Breeding

Burcu Mestav

doi:10.29133/yyutbd.638226

Abstract

Multiple influential points adversely affect parameter estimation in binary logistic regression models and lead to misinterpretation of results. An influential point is a data point that does not follow the overall slope of remaining data and has extreme value in terms of x. Since the presence of approximately 10% of influential points in a dataset affects parameter estimates, detection and diagnosis of these points greatly matter. Graphical (such as scatter graph and box graph) and analytical methods are adopted in the detection and diagnosis of multiple influential points. Among the commonly used diagnostic methods are Pearson residuals, Standardized Pearson Residuals (SPR), Cook Distance (CD), Hat matrix, DFFITS, and DFBETA. However, these methods mask problems and fail to diagnose if there are multiple influential points. Many statisticians have developed and proposed new diagnostic methods, such as Generalized Standardized Pearson Residual (GSPR) and Generalized Weights (GW), to overcome this problem. This study exploited a dataset containing multiple influential points (15%) for weaning weight (WW), yearling weight (YW), fleece weight (FW), and fertility rate (FR) of Romney ewes and modelled the effects of WW, TW and FW variables on FR by binary logistic regression model. This study is intended to determine the multiple influential points by graphical methods and to examine the performance of commonly used and newly developed methods in the diagnosis of these data points. As a result, it was observed that the commonly used methods mask multiple influential points and the new proposed methods competently identify these points.

Full Text