Abstract

Structural equation model (SEM) trees are data-driven tools for finding variables that predict group differences in SEM parameters. SEM trees build upon the decision tree paradigm by growing tree structures that divide a data set recursively into homogeneous subsets. In past research, SEM trees have been estimated predominantly with the R package semtree. The original algorithm in the semtree package selects split variables among covariates by calculating a likelihood ratio for each possible split of each covariate. Obtaining these likelihood ratios is computationally demanding. As a remedy, we propose to guide the construction of SEM trees by a family of score-based tests that have recently been popularized in psychometrics (Merkle and Zeileis, 2013; Merkle et al., 2014). These score-based tests monitor fluctuations in case-wise derivatives of the likelihood function to detect parameter differences between groups. Compared to the likelihood-ratio approach, score-based tests are computationally efficient because they do not require refitting the model for every possible split. In this paper, we introduce score-guided SEM trees, implement them in semtree, and evaluate their performance by means of a Monte Carlo simulation.

Highlights

  • Structural equation models (SEMs; Bollen, 1989; Kline, 2016) are a widely applied technique in social and psychological research to model the relationships between multiple variables

  • Multi-group structural equation models (MGSEMs) were more powerful than all SEM tree methods, given continuous and ordinal covariates, but powerful in conditions with dichotomous covariates and without noise variables, where cut points did not need to be learned from the data

  • When provided with ordinal or dichotomous covariates, naïve trees showed an adequate control of type I errors and were among the best-performing methods in terms of power to detect heterogeneity and group recovery

Read more

Summary

INTRODUCTION

Structural equation models (SEMs; Bollen, 1989; Kline, 2016) are a widely applied technique in social and psychological research to model the relationships between multiple variables. It is clear that the computational demand of SEM trees grows with the number of covariates with many unique values as every potential cut point requires the estimation of SEMs. Locating the optimal cut point in categorical, ordinal, and continuous covariates with the maximum of the likelihood ratios has important implications for the test statistic shown in Equation 3. MaxLR, CvM, maxLM, and maxLMO trees proved to be the more powerful methods for detecting heterogeneity in the random effects We expected this behavior because the DM and WDM test statistics focus on heterogeneity in a single parameter, whereas all other methods monitor group differences in multiple parameters. MGSEMs were more powerful than all SEM tree methods, given continuous and ordinal covariates, but powerful in conditions with dichotomous covariates and without noise variables, where cut points did not need to be learned from the data. It seems generally advisable to fully explore differences in all parameters or to use focus parameters rather than taking the risk of misspecifying trees by using inadequate equality constraints

Summary
Findings
DISCUSSION
DATA AVAILABILITY STATEMENT
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call