Detection of Differential Item Functioning for More Than Two Groups: A Monte Carlo Comparison of Methods

W Holmes Finch

doi:10.1080/08957347.2015.1102916

Abstract

ABSTRACTDifferential item functioning (DIF) assessment is a crucial component in test construction, serving as the primary way in which instrument developers ensure that measures perform in the same way for multiple groups within the population. When such is not the case, scores may not accurately reflect the trait of interest for all individuals in the population. Most DIF research has focused on the two groups case. However, in practice researchers may wish to investigate DIF for more than two groups; that is, for examinee ethnicity, nation of origin, or treatment condition, among others. DIF detection methods for such cases have been proposed, but little empirical work has been done to investigate their performance. Therefore, the goal of the current study was to use a simulation methodology to compare four proposed methods for assessing DIF in the multiple groups case, including the Generalized Mantel-Haenszel test, Generalized Logistic Regression, Lord’s chi-square test, and the multiple group alignment procedure. Results showed that the Generalized Mantel-Haenszel and alignment procedures provided the optimal combination of Type I error control and power.

Full Text