Abstract
The primary purpose of this study was to compare bootstrap standard errors of 5 item response theory (IRT) equating methods for the common-item nonequivalent groups design. For true-score (Method 1) and observed-score (Method 2) equating, IRT parameters were estimated separately, and a linear scaling transformation method was used to rescale the IRT parameter estimates for Form X onto the Form Y scale. For IRT chained true-score equating (Method 3), IRT parameters for Form X and Form Y were estimated separately, and then IRT chained true-score equating was performed. For the last 2 methods, IRT parameters for both forms were estimated simultaneously. Using the simultaneously estimated parameter estimates, IRT true-score (Method 4) and observed-score (Method 5) equatings were performed. For each method, the standard deviation was computed over 500 bootstrap replications to obtain the standard error of IRT equating at each raw score point for the new form. The estimated bootstrap standard errors for Methods 4 and 5 were slightly less than those for Methods 1 and 2. Method 3 produced the greatest standard errors. However, the standard errors for all 5 methods were small enough to suggest that standard errors of equating less than 0.1 standard deviation units could be obtained with any method, even with sample sizes of 500.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.