Abstract
Multistage tests are a widely used and efficient type of test presentation that aims to provide accurate ability estimates while keeping the test relatively short. Multistage tests typically rely on the psychometric framework of item response theory. Violations of item response models and other assumptions underlying a multistage test, such as differential item functioning, can lead to inaccurate ability estimates and unfair measurements. There is a practical need for methods to detect problematic model violations to avoid these issues. This study compares and evaluates three methods for the detection of differential item functioning with regard to continuous person covariates in data from multistage tests: a linear logistic regression test and two adaptations of a recently proposed score-based DIF test. While all tests show a satisfactory Type I error rate, the score-based tests show greater power against three types of DIF effects.
Highlights
Psychological and educational assessments typically use models of item response theory (IRT) to statistically describe respondent-test item interactions
The IRT framework further allows the application of advanced methods of test presentation, such as computerized adaptive testing (CAT) and multistage testing (MST)
We investigated how often individual modules were answered by 100 respondents or more under the various conditions of the simulation study, since this threshold determined whether items in this modules were investigated for differential item functioning (DIF) or not
Summary
Psychological and educational assessments typically use models of item response theory (IRT) to statistically describe respondent-test item interactions. The IRT framework further allows the application of advanced methods of test presentation, such as computerized adaptive testing (CAT) and multistage testing (MST). The principal aim of CAT and MST is to provide an economical assessment of the abilities of the individual respondents by making the presented items dependent on the respondent’s performance on previous test items. They are widely used in educational and psychological testing, for instance, in the Programme for International. We will explain the key concepts underlying MST and its relation to CAT below, see [6,7,8] for more technical introductions
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.