Abstract
ABSTRACTThis research report focuses on the stability of item response theory (IRT) item parameter estimates when the items are calibrated on two different samples of examinees who have responded to the items at two different points in time, i.e., the temporal stability of the parameter estimates. The data were collected from regular administrations of the College Board Admissions Testing Program Scholastic Aptitude Test (SAT) and Achievement Tests. The three‐parameter logistic model was used to characterize the relationship between the underlying trait and performance on an item and a variety of methods were used to assess parameter estimate stability.Two important conclusions were drawn from the study. One, stability of parameter estimates is clearly related to type of test. Parameter estimates obtained from SAT‐verbal and SAT‐mathematical are more likely to exhibit stability over time than those obtained for achievement tests of Biology or American History and Social Studies. Secondly, lack of stability of item parameter estimates appears to be more closely related to differences in ability among calibration samples than to the lapse of time between administrations of the test, which, in the study, ranged from 14 to 52 months. It should be noted, however, that for the particular tests studied, ability differences in calibration samples appeared to be unrelated to time differences between administrations. This may not be typical of other testing situations, where ability differences in the calibration samples can be directly related to length of time between test administrations. This could be brought about, for example, by changes in curricular emphasis.
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have