Abstract

ABSTRACTA simulation study of methods of assessing differential item functioning (DIF) in computer‐adaptive tests (CATs) was conducted by Zwick, Thayer and Wingersky (in press; 1993). Results showed that modified versions of the Mantel‐Haenszel and standardization methods work well with CAT data. In that study, data were generated using the three‐parameter logistic (3PL) model and this same model was assumed in obtaining item parameter estimates. In the current study, 3PL item response data were used, but the Rasch model was assumed in obtaining item parameter estimates, which, in turn, determined the information table to be used in the item selection algorithm. New Rasch‐based expected true scores were obtained for each examinee, based on responses to the CAT items. As in the previous study, the DIF statistics were highly correlated with the generating DIF, and the means and standard deviations of these statistics across items were close to their nominal values. There was, however, a tendency for DIF statistics to be slightly smaller in magnitude than in the 3PL analysis, resulting in a lower probability of detecting items with extreme DIF. This reduced sensitivity appeared to be related to a degradation in the accuracy of matching. Expected true scores from the Rasch‐based CAT tended to be biased downward, particularly for lower‐ability examinees. Unlike the Rasch CAT scores, Rasch expected true scores based on nonadaptive administration of all pool items behaved quite well, as did the nonadaptive and CAT‐based expected true scores obtained using the 3PL model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call