Abstract

ABSTRACTThe purpose of this study was to investigate the effects on IRT true‐score equating results of the characteristics of the linking items that are used to place parameter estimates for items contained in two test editions on the same scale prior to equating. The study was carried out using the three‐parameter logistic item response theory model and Monte Carlo procedures. So that the simulated data reflected actual test data, the true item parameters were taken from the estimated parameters obtained from LOGIST calibrations of item responses obtained from selected administrations of the verbal sections of the College Board Scholastic Aptitude Test (SAT‐V).The study was conducted in two phases, with the second phase based on results derived in the first phase. In the first phase, the effects of the characteristics and number of linking items were investigated using two of the common linking or scaling designs: concurrent calibration and the characteristic curve transformation method. The research extended previous research by systematically investigating the number of linking items needed to insure adequate equating results. In addition, the effects on equating of two different characteristics of the parameter estimates of the linking items were investigated: 1) items with parameter estimates having standard errors of estimation similar to those in typical SAT‐V common item sections, and 2) items with parameter estimates which have small standard errors of estimation. Finally, the effect on true‐score equating results of using peaked and uniform distributions of abilities for groups used to estimate the parameters for the items in the test editions to be equated was investigated.In the second phase of the study, the effects on equating results of common item sections that contained a few items whose item response functions were not well fit by the three‐parameter logistic model and of common item sections that contained a few items on which the two groups taking the common item section responded differently were systematically investigated. These effects were investigated using both the concurrent calibration and the characteristic curve transformation linking procedures for varying numbers of common items. The effects of these common item sections were studied using only, in most cases, a uniform ability distribution and characteristics of the parameter estimates selected as a result of the first phase of the study.The results of Phase I and II of the study indicate that, for data such as those simulated for this study, improved equating results are obtained if longer linking tests are used with a uniform distribution of examinee ability.The results of the study that investigated the quality of the linking item parameter estimates were confounded by the total test representation of the various linking tests. One possible conclusion from this aspect of the study might be that it is more important to evaluate the relative efficiency of the linking items, with respect to the total test, than it is to evaluate the size of the standard errors of estimation of the linking item parameter estimates.Finally, the study indicated that equating results, particularly those obtained using a characteristic curve transformation design, are affected by the presence of linking items that function differently for the two groups used to provide data for the equating and calibration. This is particularly true for shorter linking tests. It appears that the quality of an equating depends to some extent on prior screening of linking tests and removal of items that function differently for the two groups.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call