Item Response Theory Models for Difference-in-Difference Estimates (and Whether They Are Worth the Trouble)

James Soland

doi:10.1080/19345747.2023.2195413

Abstract

When randomized control trials are not possible, quasi-experimental methods often represent the gold standard. One quasi-experimental method is difference-in-difference (DiD), which compares changes in outcomes before and after treatment across groups to estimate a causal effect. DiD researchers often use fairly exhaustive robustness checks to make sure the assumptions of the DiD are met. However, less thought is often put into the approach to score item responses from the outcome measure used. For example, surveys are often scored by adding up the item responses to produce sum scores, and achievement tests often rely on scores produced by test vendors, which frequently employ a unidimensional item response theory (IRT) scoring model that implicitly assumes control and treatment participants are exchangeable (i.e., that they come from the same distribution). In this study, several IRT models that parallel the DiD design in terms of groups and timepoints are presented, and their performance is examined. Results indicate that using a scoring approach that parallels the DiD study design reduces bias and improves power, though these approaches can also lead to increased Type I error rates.

Full Text