Abstract

ABSTRACT An important step in scale development and assessment is to evaluate differential item functioning (DIF) across segments of the population. Recent approaches use lasso regularization to simultaneously detect DIF in all items and avoid incorrect anchor item assumptions that incur inflated error rates for classical DIF evaluation methods. Although promising, lasso methods cause underestimated standard errors and incorrect p-values. An alternative is Bayesian regularization that provides empirical standard errors. However, we point out that using empirical criteria such as credible intervals for selecting DIF parameters has limited validity. We argue that using a spike-and-slab prior with an inclusion probability criterion provides more theoretically coherent DIF selection and inference over Bayesian regularizing priors with empirical selection rules or frequentist lasso. We demonstrate this by simulation studies with Multi-group Item Response Theory and Moderated Nonlinear Factor Analysis models. Practical utility of the spike-and-slab prior selection criterion is discussed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call