Abstract

In recent years, there has been a growing interest and extensive use of computerized adaptive testing (CAT) especially in large-scale assessments. Numerous simulation studies have been conducted on both real and simulated data sets to determine the optimum conditions and develop CAT versions. Being one of the most popular large-scale assessment programs, Trends in International Mathematics and Science Study (TIMSS) has been implemented as paper and pencil tests to monitor student achievement in mathematics and science at fourth and eighth grade levels since 1995. The purpose of this study is to investigate the optimum CAT algorithm for TIMSS eighth grade mathematics assessments. Since Turkey and USA participated in 2007, 2011 and 2015 administrations, their data were combined and then 393 items were calibrated on the same scale by using marginal maximum likelihood estimation method. With this item pool, several scenarios were proposed and tested to determine not only the optimum starting rule, ability estimation method, test termination rule but also the efficiency of exposure control method. The results of the study indicated that estimating abilities with expected a posteriori method after 6 random items, terminating the fixed-length test after 20 items seemed to be the optimum algorithm for TIMSS eighth grade mathematics assessments. Also, it was found that using item exposure control had a prior importance for the effective use of the item pool. This study has some implications for both national and international large-scale test developers in determining the optimum CAT algorithm and its consequences compared with paper and pencil versions.

Highlights

  • Educational testing has mainly been focused on traditional paper and pencil tests until the technological developments have supported the emergence of computers

  • Unlike traditional tests in which all participants take a single form, the computerized adaptive testing (CAT) algorithm tailors the items according to the response patterns (Sireci, Baldwin, Martone, Kaira, Lam, & Hambleton, 2008) and finitely many test forms can be created during test

  • Simulations were conducted to compare the effects of the determined situations on variablelength tests so that the average test length and correlation coefficient between true and estimated theta values were calculated

Read more

Summary

Introduction

Educational testing has mainly been focused on traditional paper and pencil tests until the technological developments have supported the emergence of computers. Instead of administering same set of items to the participants, different test forms have been assembled in computer-based testing This becomes meaningful when the participant’s cumulative performance on earlier items determines the selection of newer items (Davey & Pitoniak, 2006). The correct response of a participant is followed by more difficult item and the incorrect response is followed by an easier item (Hambleton, Swaminathan, & Rogers, 1991; Luecht & Sireci, 2012; van der Linden, 2010). This optimization process continues until the test administrators have enough certainty about the sufficiency of information about participant’s ability level. Unlike traditional tests in which all participants take a single form, the CAT algorithm tailors the items according to the response patterns (Sireci, Baldwin, Martone, Kaira, Lam, & Hambleton, 2008) and finitely many test forms can be created during test

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call