This study aims to choose the equating method with the least equating error by using the equating methods in Classical Test Theory and Item Response Theory. In this study, booklet 1 and booklet 3 data were used for PISA (Programme for International Student Assessment) 2012 Mathematics test. Data from Turkey, Indonesia, Shanghai/China and Finland, countries participating in PISA 2012, were selected for this study. Non-equivalent groups design was used in the test equating process. Linear equating methods [Tucker (w1=1, w1=0.5), Levine observed score (w1=1, w1=0.5), Levine true score, Classical Congeneric and Braun-Holland), equipercentile equating methods (pre smoothing according to C6 polynomial degree, beta4, post smoothing according to S 0.05 cubic function, frequency estimation (w1=1, w1=0.5) ] were used in the study. In Classical Test Theory, the least error is obtained from the frequency estimation method with a synthetic universe weight of w1 = 0.5. For the Item Response Theory, the calibration method was first decided, which is the Stocking-Lord method. After the scale transformation was achieved with the Stocking-Lord calibration method, the equating scores were calculated from the IRT's true and observed equating methods. The least error in IRT was obtained from the true score equating method. For error values, error coefficients were calculated according to Newton-Raphson's delta method and bootstrap methods. When the error coefficients (delta and bootstrap) of the equating methods in both theories were compared, it was found that the equating methods based on IRT had fewer errors than the equating methods in CTT, and the method with the least equating error was the IRT true score equating. The least equating error frequency estimation in CTT (w1=0.5) and the most error Levine true score equating method.
Read full abstract