Abstract

The aim of the study is to examine differential item functioning (DIF) detection methods—the simultaneous item bias test (SIBTEST), Item Response Theory likelihood ratio (IRT-LR), Lord chi square (χ2), and Raju area measures—based on ability estimates when purifying items with DIF from the test, considering conditions of ratio of the items with DIF, effect size of DIF, and type of DIF. This study is a simulation study and 50 replications were conducted for each condition. In order to compare DIF detection methods, error (RMSD) and coefficient of concordance (Pearson’s correlation coefficient) were calculated according to estimated and initial abilities for the reference group. As a result of the study, the lowest error and the highest concordance were seen in the case of 10% uniform DIF in the test and the method of IRT-LR, considering all other conditions. Moreover, for the method of SIBTEST and IRT-LR in all conditions, it was found that the error obtained by purifying items with C level DIF is lower than the error obtained by purifying items with both B and C level DIF. Similarly, for the method of SIBTEST and IRT-LR in all conditions, it was seen that the concordance coefficient found by purifying C level DIF is higher than the coefficient by purifying items with both B and C level DIF.

Highlights

  • Tests which are used in education and psychology for various purposes should meet specific standards, such as validity, reliability, and practicality

  • As a result of removing items with differential item functioning (DIF) in all conditions, the method of Item Response Theory likelihood ratio (IRT-LR) showed the minimum error in the 10% rate of DIF and uniform DIF type

  • If the coefficients of concordance were examined, after removing DIF items, the method of Item Response Theory (IRT)-LR showed the maximum correlation in the 10% rate of DIF and uniform DIF type

Read more

Summary

Introduction

Tests which are used in education and psychology for various purposes should meet specific standards, such as validity, reliability, and practicality. According to Messick (1995) these characteristics are the fundamental principles of measurement, and the social values used by decision-makers in addition to measurement In this regard, items in the test should not provide advantages or disadvantages for any subgroup at the same ability level. Bias can be defined as a systematic error in test scores depending on a group of individuals (Camilli & Shepard, 1994). When viewed from this aspect, bias is a major threat for validity and objectivity of a test (Clauser & Mazor, 1998; Kristanjansonn, Aylesworth, McDowell, & Zumbo, 2005)

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call