In the academic achievement and ability testing programs, item response theory (IRT) methods have been widely used for test equating. The standard error (SE) of equating indicates the amount of random error that is due to the sampling of examinees in estimating the population equivalent scores relationship. The purpose of the present study was to use computer simulations to examine the accuracy and performance of the delta method formulas for estimating SEs of IRT true score and observed score equating that have been presented under the random groups design. Test type (3PL model test, 3PL+GPC model test, and 3PL+GR model test) and sample size were considered as simulation factors. Main results were as follows. First, for all the test types, the theoretical SEs of IRT equating estimated by the delta method were very close to the empirical SEs computed using the simulated equated scores. Second, the SEs of IRT equating were reduced as the sample size increased, and they were, approximately, inversely proportional to the square root of the sample size. Third, except for the extreme (lowest or highest) test scores, the SEs of IRT observed score equating tended to be slightly smaller than the SEs of IRT true score equating. Fourth, on the average, the SEs of IRT equating for the 3PL model test, which consisted of 30 multiple-choice items, were smaller than those for the mixed-format tests, which consisted of the same 30 multiple-choice items and 10 constructed-response items.
Read full abstract