An overview of differential item functioning in multistage computer adaptive testing using three-parameter logistic item response theory

Zainab Abolfazli Khonbi,Karim Sadeghi

doi:10.1186/s40468-017-0038-z

Zainab Abolfazli Khonbi, Karim Sadeghi

Open Access

https://doi.org/10.1186/s40468-017-0038-z

Copy DOI

Abstract

As perfectly summarised by Ida Lawrence, “Testing is growing by leaps and bounds across the world. There is a realization that a nation’s well-being depends crucially on the educational achievement of its population. Valid tests are an essential tool to evaluate a nation’s educational standing and to implement efficacious educational reforms. Because tests consume time that otherwise could be devoted to instruction, it is important to devise tests that are efficient. Doing so requires a careful balancing of the contributions of technology, psychometrics, test design, and the learning sciences. Computer adaptive multistage testing (MSCAT) fits the bill extraordinarily well; unlike other forms of adaptive testing, it can be adapted to educational surveys and student testing. Research in this area will be an evidence that the methodologies and underlying technology that surround MSCAT have reached maturity and that there is a growing acceptance by the field of this type of test design” (from the Foreword to Y. Duanli, A. A. von Davier, & L. Charles (Eds.), Computerized multistage testing: theory and application). This state-of-the-art paper aims to present an overview of differential item functioning (DIF) in MSCAT using three-parameter logistic item response theory (IRT), offering suggestions to implement it in practice with a hope to motivate testing and assessment researchers and practitioners to initiate projects in this under-practiced area by helping them to better understand some of the relevant technical concepts.

Highlights

Item response theory Dating back to mid-twentieth century, a new theoretical basis for educational and psychological testing and measurement has emerged which has been called latent trait theory known nowadays as item response theory (IRT)
Though the understanding of the concepts and issues in IRT is somewhat hard for the novice tester, IRT-based research has attracted the attention of many researchers interested in measurement and testing for a number of reasons: (a) IRT has the possibility of comparing between the latent traits of individuals from a variety of populations when they are subjected to tests or questionnaires that have certain common items; (b) it allows for the comparison of individuals of the same population submitted to totally different tests; this is possible because in IRT, the items—not the tests or the questionnaire as a whole—are
This makes the interpretation of the resulting scale easier and allows them to know which items are producing information throughout the scale (Embretson and Reise 2000); (e) IRT allows for the treatment of a group whose data is missing through the given responses alone, an exercise which is impossible in CTT; and (f ) it follows the principle of invariance which means that the item parameters do not depend on the respondent’s latent traits and that the parameters of individuals are not dependent on the given items (Hambleton et al 1991)

Summary

Background

Item response theory Dating back to mid-twentieth century, a new theoretical basis for educational and psychological testing and measurement has emerged which has been called latent trait theory known nowadays as item response theory (IRT). Sadeghi and Abolfazli Khonbi Language Testing in Asia (2017) 7:7 regarded as central elements (Andrade et al 2000); (c) it allows for a better analysis of each item that makes up the measurement instrument since the scale-building characteristics specific to it are considered; (d) the items and the individuals are in the same scale so that the level of every single individual’s characteristic can be compared to the level of the characteristic which the item demands This makes the interpretation of the resulting scale easier and allows them to know which items are producing information throughout the scale (Embretson and Reise 2000); (e) IRT allows for the treatment of a group whose data is missing through the given responses alone, an exercise which is impossible in CTT; and (f ) it follows the principle of invariance which means that the item parameters do not depend on the respondent’s latent traits and that the parameters of individuals are not dependent on the given items (Hambleton et al 1991). The formula for estimating the probability of an individual’s correct answer on an item becomes (for further detailed explanations, see Curtis 2010)

PðY ij

MSCAT and DIF

Findings

Conclusions