Abstract

Structural equation model (SEM) trees, a combination of SEMs and decision trees, have been proposed as a data-analytic tool for theory-guided exploration of empirical data. With respect to a hypothesized model of multivariate outcomes, such trees recursively find subgroups with similar patterns of observed data. SEM trees allow for the automatic selection of variables that predict differences across individuals in specific theoretical models, for instance, differences in latent factor profiles or developmental trajectories. However, SEM trees are unstable when small variations in the data can result in different trees. As a remedy, SEM forests, which are ensembles of SEM trees based on resamplings of the original dataset, provide increased stability. Because large forests are less suitable for visual inspection and interpretation, aggregate measures provide researchers with hints on how to improve their models: (a) variable importance is based on random permutations of the out-of-bag (OOB) samples of the individual trees and quantifies, for each variable, the average reduction of uncertainty about the model-predicted distribution; and (b) case proximity enables researchers to perform clustering and outlier detection. We provide an overview of SEM forests and illustrate their utility in the context of cross-sectional factor models of intelligence and episodic memory. We discuss benefits and limitations, and provide advice on how and when to use SEM trees and forests in future research. (PsycINFO Database Record

Highlights

  • Max Planck Institute for Human Development, Berlin, Germany and European University Institute, San Domenico di Fiesole, Italy

  • We extend Structural equation model (SEM) trees to SEM forests following the seminal work on random forests by Breiman (2001a)

  • We present the procedure for growing a forest of SEM trees and describe two aggregate measures that allow researchers to obtain useful information about heterogeneity in their datasets: (a) variable importance, which quantifies the extent to which variables predict differences with respect to the initial SEM, and (b) case proximity, which enables researchers to perform case-based clustering based on a measure of similarity in predictor space

Read more

Summary

Introduction

SEM trees allow for the automatic selection of variables that predict differences across individuals in specific theoretical models, for instance, differences in latent factor profiles or developmental trajectories. SEM forests, which are ensembles of SEM trees based on resamplings of the original dataset, provide increased stability. Because large forests are less suitable for visual inspection and interpretation, aggregate measures provide researchers with hints on how to improve their models: (a) variable importance is based on random permutations of the out-of-bag (OOB) samples of the individual trees and quantifies, for each variable, the average reduction of uncertainty about the model-predicted distribution; and (b) case proximity enables researchers to perform clustering and outlier detection. McArdle, Department of Psychology, University of Southern California; Ulman Lindenberger, Center for Lifespan Psychology, Max Planck Institute for Human Development, and European University Institute, San Domenico di Fiesole, Italy.

Methods
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call