Abstract

The research on test equating is very important for fairness of examnation, item banking, teaching quality assessing and computerized adaptive test. Along with the development of research on examination, testlets have appeared in different examnations increasingly, such as reading comprehension, mathematics, map etc. How to equate tests composed of testlets is a problem we are facing. When item response theory (IRT) models are applied in test equating, strong statistical assumptions—local independence (LI)—must be met. However, previous studies have shown that local independence is likely to be violated when testlets are contained in test. Hence, when equating tests composed of testlets, that local dependence is ignored can lead to distortion of equating coefficients using standard IRT model. In order to solve this problem, we use a testlets-based model—2 Parameters Testlet Model (2PTM), which derives from IRT 2 Parameters Logistic Model by adding random-effect parameters associated with each testlet. Local dependence is considered in 2PTM. IRT characteristic curve equating methods and specific procedures for calculating equating coefficients were presented in this paper. In terms of the recovery of estimating the equating coefficients and based on Wilcoxon sign-rank test, a lot of experiments was done using Monte Carlo simulation method. The effectiveness of equating tests containing testlets was investigated under the several conditions, including the accuracy of the estimation of item parameters (AEIP), the number of examinees and the degree of local dependence. The findings of equating tests made up of testlets using 2PTM were compared with standard IRT model—2PLM, which not account for local dependence among items from a common testlet. Results suggest that 2PTM is better than 2PLM in recovery and have significant differences mostly, so 2PTM is suitable for equating tests based testlets. In addition, the findings of using six different equating criterions for 2PTM were also compared with each other. The results showed that, generally speaking, when the value of the coefficient A is between 0.5 and 0.9, the performance of SLcrit is the best, SQRcrit is proper for 0.9A1.5 and Hcrit is proper for 1.5A2.0. The higher AEIP is, the better SQRcrit and SLcrit perform. Hcrit and SQRcrit are proper for large testlet effect. LCcrit, Wcrit and SREcrit are rarely better than others.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.