Automatic item generation has become a promising area in recent studies. In automatic item generation, items with targeted psychometric properties are generated during testing. The feasibility of automatic item generation lies in the fact that items are generated from a set of observable item stimulus features, which are mapped onto the cognitive variables underlying the item solution and are calibrated through cognitive psychometric models. Parameters of a generated item can then be predicted from the specific combination of the calibrated item stimulus features in the item. Predicted item parameters, compared to those calibrated from empirical data, involve more complex sources of uncertainty. Although the relationship between sufficiency of the cognitive model of item solving and the adequacy of item parameter prediction can be theoretically justified, the degree to which such predicted parameters impact various aspects of testing, however, is an empirical question and needs to be explored. This paper investigated the impact of predicted item parameters on ability estimates in a computerized adaptive testing environment, based on abstract reasoning test (ART) items which were generated using cognitive design system approach (Embretson, 1998). The item bank contained 150 items with two sets of item difficulties, of which one is predicted from the item design features and the other is calibrated from sample data. Each of the 263 subjects participated in the study received two subtests, of which one was based on predicted parameters, the other calibrated parameters. The item bank was split into two parallel halves based on predicted item parameters to prevent items in the bank from repeat administrations within subjects. Subjects were randomly assigned into one of the four testing procedures resulting from the combinations between parameter types (predicted versus calibrated) and item bank (first half versus second half). Results of the study showed a clear regression-to-the-mean effect of the predicted item parameters, compare to its counterpart of calibrated item parameters. Inward biases of ability estimates from subtest using predicted item parameters were observed when ability estimates were compared across different subtests within subjects. Compared to its counterpart using calibrated parameters, standard errors of ability estimates were larger for those from the subtest using predicted item parameters in the mid-range of the scale, where regression-to-the-mean effect of the predicted item parameters is minimal, and were smaller in the rest of the scale, possibly due to joint impact of increased uncertainty of predicted item parameters, estimation biases and limitation of the item bank at various levels of ability scale. When ability was estimated from the same subtest using different types of item parameters, very high correlation (.995) were obtained and no biases were observed throughout almost the entire scale. Standard errors of ability estimates were larger for predicted parameters yet the differences were small.
Read full abstract