Modeling sparsely clustered data: design-based, model-based, and single-level methods.

Daniel M Mcneish

doi:10.1037/met0000024

Abstract

Recent studies have investigated the small sample properties of models for clustered data, such as multilevel models and generalized estimating equations. These studies have focused on parameter bias when the number of clusters is small, but very few studies have addressed the methods' properties with sparse data: a small number of observations within each cluster. In particular, studies have yet to address the properties of generalized estimating equations, a possible alternative to multilevel models often overlooked in behavioral sciences, with sparse data. This article begins with a discussion of population-averaged and cluster-specific models, provides a brief overview of both multilevel models and generalized estimating equations, and then conducts a simulation study on the sparse data properties of generalized estimating equations, multilevel models, and single-level regression models for both normal and binary outcomes. The simulation found generalized estimating equations estimate regression coefficients and their standard errors without bias with as few as 2 observations per cluster, provided that the number of clusters was reasonably large. Similar to the previous studies, multilevel models tended to overestimate the between-cluster variance components when the cluster size was below about 5.

Full Text