Abstract

In conventional structural equation modeling (SEM), with the presence of even a tiny amount of data contamination due to outliers or influential observations, normal-theory maximum likelihood (ML-Normal) is not efficient and can be severely biased. The multivariate-t-based SEM, which recently got implemented in Mplus as an approach for mixture modeling, represents a robust estimation alternative to downweigh the impact of outliers and influential observations. To our knowledge, the use of maximum likelihood estimation with a multivariate-t model (ML-t) to handle outliers has not been shown in SEM literature. In this paper we demonstrate the use of ML-t using the classic Holzinger and Swineford (1939) data set with a few observations modified as outliers or influential observations. A simulation study is then conducted to examine the performance of fit indices and information criteria under ML-Normal and ML-t in the presence of outliers. Results showed that whereas all fit indices got worse for ML-Normal with increasing amount of outliers and influential observations, their values were relatively stable with ML-t, and the use of information criteria was effective in selecting ML-normal without data contamination and selecting ML-t with data contamination, especially when the sample size was at least 200.

Highlights

  • Previous studies have shown that the use of robust covariance matrix based on weights corresponding to a multivariate t distribution provided good parameter estimates and likelihood ratio test statistic (LRT) statistics similar to those obtained without outliers under ML-Normal (Yuan and Bentler, 1998b), to our knowledge no analytic and simulation studies have evaluated the performance of LRT and fit indices obtained under the multivariate-t-based structural equation modeling (SEM) as implemented in Mplus (i.e., MLt), with df being estimated instead of specified by users

  • As pointed out in Yuan and Zhong (2013), unlike general statistics software where diagnostic tools for outliers and influential observations are common, such tools are rarely accessible for SEM software, partly because of the complexity of SEM modeling

  • Whereas robust SEM using Hubertype weights has been developed and shown to perform well, and the rsem package is freely available in R, many researchers are more familiar with other commonly used SEM software packages such as Mplus, and so it is important to have comparable tools for handling outliers and influential observations in other software

Read more

Summary

OUTLIERS AND INFLUENTIAL OBSERVATIONS

Whereas topics related to outliers, or more generally data contamination, are commonly discussed in quantitative research methodology textbooks, in practice researchers do not always agree on their definitions and how best to handle them. In regression with only one response variable, outliers are cases with a large deviation from its predicted value based on the regression line In multivariate analyses such as SEM, the distance of an observation from the center of most of the data points is commonly quantified by the Mahalanobis distance (d), where: di = (yi − μ)⊤ (yi − μ),. As discussed in Yuan and Zhong (2008), outliers in SEM have large values of e, and will inflate the covariance matrix of the outcome variables. It may or may not have large values in η. Influential observations can be good or bad: good influential observations have extreme ξ but not extreme e values, and will not negatively impact model fit as it is not considered outliers; bad influential observations, on the other hand, have both extreme ξ and e values, and will negatively impact model fit

Impact of Outliers and Influential Observations
Existing Robust Estimation Methods in SEM
REAL DATA DEMONSTRATION
SIMULATION STUDY
Simulation Results
DISCUSSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call