Abstract

influence.ME provides tools for de- tecting influential data in mixed effects mod- els. The application of these models has become common practice, but the development of diag- nostic tools has lagged behind. influence.ME calculates standardized measures of influential data for the point estimates of generalized mixed effects models, such as DFBETAS, Cook's dis- tance, as well as percentile change and a test for changing levels of significance. influence.ME calculates these measures of influence while ac- counting for the nesting structure of the data. The package and measures of influential data are introduced, a practical example is given, and strategies for dealing with influential data are suggested. The application of mixed effects regression models has become common practice in the field of social sci- ences. As used in the social sciences, mixed effects re- gression models take into account that observations on individual respondents are nested within higher- level groups such as schools, classrooms, states, and countries (Snijders and Bosker, 1999), and are often referred to as multilevel regression models. Despite these models' increasing popularity, diagnostic tools to evaluate fitted models lag behind. We introduce influence.ME (Nieuwenhuis, Pelzer, and te Grotenhuis, 2012), an R-package that provides tools for detecting influential cases in mixed effects regression models estimated with lme4 (Bates and Maechler, 2010). It is commonly accepted that tests for influential data should be performed on regression models, especially when estimates are based on a relatively small number of cases. How- ever, most existing procedures do not account for the nesting structure of the data. As a result, these existing procedures fail to detect that higher-level cases may be influential on estimates of variables measured at specifically that level. In this paper, we outline the basic rationale on de- tecting influential data, describe standardized mea- sures of influence, provide a practical example of the analysis of students in 23 schools, and discuss strate- gies for dealing with influential cases. Testing for influential cases in mixed effects regression models is important, because influential data negatively in- fluence the statistical fit and generalizability of the model. In social science applications of mixed mod- els the testing for influential data is especially im- portant, since these models are frequently based on large numbers of observations at the individual level while the number of higher level groups is relatively small. For instance, Van der Meer, te Grotenhuis, and Pelzer (2010) were unable to find any country-level comparative studies involving more than 54 coun- tries. With such a relatively low number of coun- tries, a single country can easily be overly influen- tial on the parameter estimates of one or more of the country-level variables.

Highlights

  • The application of mixed effects regression models has become common practice in the field of social sciences

  • We introduce influence.ME (Nieuwenhuis, Pelzer, and te Grotenhuis, 2012), an R-package that provides tools for detecting influential cases in mixed effects regression models estimated with lme4 (Bates and Maechler, 2010)

  • We outline the basic rationale on detecting influential data, describe standardized measures of influence, provide a practical example of the analysis of students in 23 schools, and discuss strategies for dealing with influential cases

Read more

Summary

Detecting Influential Data

All cases used to estimate a regression model exert some level of influence on the regression parameters. If a single case has extremely high or low scores on the dependent variable relative to its expected value — given other variables in the model, one or more of the independent variables, or both — this case may overly influence the regression parameters by ’pulling’ the estimated regression line towards itself. If a case has very extreme scores on the independent variable(s) but is fitted very well by a regression model, and if this case has a low deleted (standardized) residual, this case is not necessarily overly influencing the outcomes of the regression model. We introduce the measure of percentile change and a test for changing levels of significance of the fixed parameters Up to this point, this discussion on influential data was limited to how single cases can overly influence the point estimates (or BETAS) of a regression model. Inferences made to the population from models in which such cases are present may be incorrect

Detecting Influential Data in Mixed Effects Models
The Outcome Measures
Test for changes in significance
AIC BIC logLik deviance REMLdev
Visual Examination
Class structure level
Calculating measures of influence
School ID
Dealing with Influential Data
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call