Abstract

A new influence measure is proposed to assess the influence of individual ob- servations on prediction mean square errors (PMSE) in variable selection problems. It is based on the estimated PMSE which consists of Cook's distance and Mallows' CP statistic. Another interpretation of Cook's distance is also given through the expression of the new influence measure. Illustrative examples show the effectiveness of the new influence measure. Here we consider the detection of influential observations in variable selection problems in linear regression. Many studies have been published on the detection of influential observations. Cook and Weisberg (1982) and Chatterjee and Hadi (1988), for example, propose some influence measures for each observation in case of fixed variable subsets. A representative influence measure is Cook's distance proposed by Cook (1977). Some papers deal with the detection of influential observations when the vari- able subsets are not fixed. Weisberg (1981) derives an influence measure based on CP statistic suggested by Mallows (1973). The influence measure allocates CP value to individual observations and consists of residual and leverage parts, as is usual in standard regression diagnostics. Leger and Altman (1993) propose an in- fluence measure, which is Cook's distance computed from the difference between predicted values of the response variable, based on selected variable subsets with all observations and without one observation. They give a sensitivity analysis combined with the variable selection problem, where they take up Mallows' CP statistic and step-wise regression procedures such as forward selection and back- ward elimination. Gupta and Huang (1996) introduce an influence measure as an alternative to Cook's distance to detect influential observations. They derive a measure of goodness of fit for the fitted models to select important variable subsets. In this paper, we extend the above influence measures to assess the influence of an observation on prediction mean square errors for the selected variable sub- set. In usual regression diagnostics with Cook's distance, we detect influential

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call