SINGLE‐ VERSUS DOUBLE‐SCORING OF TREND RESPONSES IN TREND SCORE EQUATING WITH CONSTRUCTED‐RESPONSE TESTS

Xuan Tan,Gautam Puhan,Kathryn L Ricker

doi:10.1002/j.2333-8504.2010.tb02219.x

Abstract

ABSTRACTThis study examines the differences in equating outcomes between two trend score equating designs resulting from two different scoring strategies for trend scoring when operational constructed‐response (CR) items are double‐scored—the single group (SG) design, where each trend CR item is double‐scored, and the nonequivalent groups with anchor test (NEAT) design, where each trend CR item is single‐scored during trend score equating—for varying sample sizes (n =150, 200, 250, 300, 400). Overall results suggest larger equating errors with smaller sample sizes, though errors were small regardless of sample size. The NEAT design performed about as well as the SG design with respect to conditional and summative standard errors of equating, though it did tend to produce larger bias and root mean‐squared differences (RMSDs). When accounting for the total number of trend scores required to do analyses, the NEAT design performed as well or better than the SG design (e.g., when the NEAT n =150 and the SG n = 300). This result might be partially attributable to a larger operational sample size (n = 792) and a good correlation between anchor and total score for the trend sample (r = 0.73). These results suggest that under these testing conditions, the NEAT design performed about as well as the SG design, but further research is required to assess the generalizability of the results.

Full Text