Abstract

This article investigates the comparison of two groups based on the two-parameter logistic item response model. It is assumed that there is random differential item functioning in item difficulties and item discriminations. The group difference is estimated using separate calibration with subsequent linking, as well as concurrent calibration. The following linking methods are compared: mean-mean linking, log-mean-mean linking, invariance alignment, Haberman linking, asymmetric and symmetric Haebara linking, different recalibration linking methods, anchored item parameters, and concurrent calibration. It is analytically shown that log-mean-mean linking and mean-mean linking provide consistent estimates if random DIF effects have zero means. The performance of the linking methods was evaluated through a simulation study. It turned out that (log-)mean-mean and Haberman linking performed best, followed by symmetric Haebara linking and a newly proposed recalibration linking method. Interestingly, linking methods frequently found in applications (i.e., asymmetric Haebara linking, recalibration linking used in a variant in current large-scale assessment studies, anchored item parameters, concurrent calibration) perform worse in the presence of random differential item functioning. In line with the previous literature, differences between linking methods turned out be negligible in the absence of random differential item functioning. The different linking methods were also applied in an empirical example that performed a linking of PISA 2006 to PISA 2009 for Austrian students. This application showed that estimated trends in the means and standard deviations depended on the chosen linking method and the employed item response model.

Highlights

  • The analysis of educational and psychological tests is an important field in the social sciences

  • Linking methods frequently found in applications perform worse in the presence of random differential item functioning

  • A significant obstacle in applying linking methods is that the test items could behave differently in the two groups, that is it cannot be expected that the two groups share a common set of statistical parameters for the test items

Read more

Summary

Introduction

The analysis of educational and psychological tests is an important field in the social sciences. The two-parameter logistic (2PL; [15]) IRT model is investigated to compare two groups on test items. A significant obstacle in applying linking methods is that the test items could behave differently in the two groups (i.e., differential item functioning), that is it cannot be expected that the two groups share a common set of statistical parameters for the test items. Such a situation is important in educational large-scale assessment (LSA; [17,18,19]) studies in which several countries are compared. It can be expected that test items function differently because there are curricular differences in those countries

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.