Abstract

Multistage tests are a widely used and efficient type of test presentation that aims to provide accurate ability estimates while keeping the test relatively short. Multistage tests typically rely on the psychometric framework of item response theory. Violations of item response models and other assumptions underlying a multistage test, such as differential item functioning, can lead to inaccurate ability estimates and unfair measurements. There is a practical need for methods to detect problematic model violations to avoid these issues. This study compares and evaluates three methods for the detection of differential item functioning with regard to continuous person covariates in data from multistage tests: a linear logistic regression test and two adaptations of a recently proposed score-based DIF test. While all tests show a satisfactory Type I error rate, the score-based tests show greater power against three types of DIF effects.

Highlights

  • Psychological and educational assessments typically use models of item response theory (IRT) to statistically describe respondent-test item interactions

  • The IRT framework further allows the application of advanced methods of test presentation, such as computerized adaptive testing (CAT) and multistage testing (MST)

  • We investigated how often individual modules were answered by 100 respondents or more under the various conditions of the simulation study, since this threshold determined whether items in this modules were investigated for differential item functioning (DIF) or not

Read more

Summary

Introduction

Psychological and educational assessments typically use models of item response theory (IRT) to statistically describe respondent-test item interactions. The IRT framework further allows the application of advanced methods of test presentation, such as computerized adaptive testing (CAT) and multistage testing (MST). The principal aim of CAT and MST is to provide an economical assessment of the abilities of the individual respondents by making the presented items dependent on the respondent’s performance on previous test items. They are widely used in educational and psychological testing, for instance, in the Programme for International. We will explain the key concepts underlying MST and its relation to CAT below, see [6,7,8] for more technical introductions

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call