Abstract

<p style="text-align:justify">The purposes of this research are: 1) to compare two equalizing tests conducted with Hebara and Stocking Lord method; 2) to describe the characteristics of each equalizing test method using windows’ IRTEQ program. This research employs a participatory approach as the data are collected through questionnaires based on the National Examination Administration of 2018. The samples are classified into group A and group B respectively by 449 and 502 respondents. This paper discusses how to equalize shared items using the anchor method with a set of instruments in the forms of 35 questionnaire items and 6 shared items. In addition, the researcher also uses PARSCALE to estimate each respondent’s skills and each item’s characteristics. The shared items are eventually equalized using IRTEQ program. The results show that there is a significant difference between those conducted using Haebara method (0.592) which produces bigger mean-sigma value and Stocking & Lord (0.00213). Thus, the results show that the shared testing items may improve respondents’ discrimination and increase the difficulty level (parameter b). Due to the availability of shared items, it is good and appropriate to equalize two different tests on different theta skills.</p>

Highlights

  • Scoring is one of the most important components in education system

  • The purposes of this research are: 1) to compare two equalizing tests conducted with Hebara and Stocking Lord method; 2) to describe the characteristics of each equalizing test method using windows’ IRTEQ program

  • Equating method based on item responsive theory has the function to determine conversion Constanta

Read more

Summary

Introduction

Scoring is one of the most important components in education system. Scoring results may reflect the development or progress of educational outputs when compared from time to time, school to school, or district to district. Some procedures to conduct a testing equating activity are based on item responsive theory as follows: 1) conducting item parameter estimation and ability parameter; 2) Estimating scale of item responsive theory using linear transformation; and 3) Equalizing the scores. There are three data designing bases to take or analyze in conducting a testing equating (Brennan & Kolen, 2004), including: 1) Data design collected from two groups tested with different packages with the same content outlines, in which both packages are randomly distributed; 2) For equating process, one of the tested groups is given package A, package B, and package A once again; and 3) Different testing instrument is given to different test takers In both packages, there is an anchor test. As research conducted by Rahmawati (2015), which analyzes the equivalent with the results of the 0.5 point score raw TCC difference criteria leads to 100% consistency in the graduation classification

Methodology
Discussion and Conclusion
Findings
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.