Performance of SIBTEST When the Percentage of DIF Items is Large

Andrea Gotzmann,Mark J Gierl,Keith A Boughton

doi:10.1207/s15324818ame1703_2

Abstract

Differential item functioning (DIF) analyses are used to identify items that operate differently between two groups, after controlling for ability. The Simultaneous Item Bias Test (SIBTEST) is a popular DIF detection method that matches examinees on a true score estimate of ability. However in some testing situations, like test translation and adaptation, the percentage of DIF items can be large. In these situations, the effectiveness of SIBTEST has not been thoroughly evaluated. The problem is addressed in this study. Four variables were manipulated in a simulation study: The amount of DIF on a 40-item test (20%, 40%, and 60% of the items on the test had moderate and large DIF), the direction of DIF (balanced and unbalanced DIF items), sample size (500, 1,000, 1,500, and 2,000 examinees in each group), and ability distribution differences between groups (equal and unequal). Each condition was replicated 100 times to facilitate the computation of the DIF detection rates. The results from the simulation study indicated that SIBTEST yielded adequate DIF detection rates, even when 60% of the items contained DIF, providing DIF was balanced between the reference and focal groups and sample sizes were at least 1,000 examinees per group. SIBTEST also had adequate detection rates in the 20% unbalanced DIF conditions with samples of 1,000 examinees per group. However, SIBTEST had poor detection rates across all 40% and 60% unbalanced DIF conditions. Implications for practice and future directions for research are discussed.

Full Text