Abstract

ABSTRACTThis study examines the differences in equating outcomes between two trend score equating designs resulting from two different scoring strategies for trend scoring when operational constructed‐response (CR) items are double‐scored—the single group (SG) design, where each trend CR item is double‐scored, and the nonequivalent groups with anchor test (NEAT) design, where each trend CR item is single‐scored during trend score equating—for varying sample sizes (n =150, 200, 250, 300, 400). Overall results suggest larger equating errors with smaller sample sizes, though errors were small regardless of sample size. The NEAT design performed about as well as the SG design with respect to conditional and summative standard errors of equating, though it did tend to produce larger bias and root mean‐squared differences (RMSDs). When accounting for the total number of trend scores required to do analyses, the NEAT design performed as well or better than the SG design (e.g., when the NEAT n =150 and the SG n = 300). This result might be partially attributable to a larger operational sample size (n = 792) and a good correlation between anchor and total score for the trend sample (r = 0.73). These results suggest that under these testing conditions, the NEAT design performed about as well as the SG design, but further research is required to assess the generalizability of the results.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.