Abstract

Building models fully informed by theory is challenging when data sets are large and strong assumptions about all variables of interest and their interrelations cannot be made. Machine learning-inspired approaches have been gaining momentum in modeling such “big” data because they offer a systematic approach to searching for potential interrelationships among variables. In practice, researchers may often start with a small model strongly guided by theory. In a second step, however, they quickly face the challenge of selecting among additional variables as to whether they should be included in or omitted from the model. This situation calls for both a confirmatory statistical modeling approach and an exploratory statistical learning approach to data analysis within a single framework. Structural equation model (SEM) trees, a combination of SEM and decision trees (also known as classification and regression trees), offer a principled solution to this selection problem. SEM trees hierarchically split empirical data into homogeneous groups sharing similar data patterns by recursively selecting optimal predictors of these differences from a potentially large set of candidate variables. SEM forests are an extension of SEM trees, consisting of ensembles of SEM trees, each built on a random sample of the original data. By aggregating over ensembles of SEM trees (SEM forests), we obtain measures of variable importance that are more robust than measures from single trees. In the present chapter, we combine SEM trees and SEM-based continuous time modeling. The resulting approach of continuous time SEM trees will be illustrated by exploring dynamics in perceptual speed using data from the COGITO study.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call