ABSTRACTThis study used real data to construct testing conditions for comparing results of chained linear, Tucker, and Levine‐observed score equatings. The comparisons were made under conditions where the new‐ and old‐form samples were similar in ability and when they differed in ability. The length of the anchor test was also varied to enable examination of its effect on the three different equating methods. Two tests were used in the study, and the three equating methods were compared to a criterion equating to obtain estimates of random equating error, bias, and root mean squared error (RMSE). Results showed that for most of the conditions studied, chained linear score equating produced fairly good equating results in terms of low bias and RMSE. In some conditions, Levine‐observed score equating also produced low bias and RMSE. Although the Tucker method always produced the lowest random equating error, it produced a larger bias and RMSE than either of the other equating methods. Based on these results, it is recommended that either chained linear or Levine score equating be used when new‐ and old‐form samples differ in ability and/or when the anchor‐to‐total correlation is not very high.
Read full abstract