Investigation of Item Selection Methods According to Test Termination Rules in CAT Applications

Sema Sulak,Hülya Keleci̇oğlu

doi:10.21031/epod.530528

Abstract

In this research, computerized adaptive testing item selection methods were investigated in regard to ability estimation methods and test termination rules. For this purpose, an item pool including 250 items and 2000 people were simulated (M = 0, SD = 1). A total of thirty computerized adaptive testing (CAT) conditions were created according to item selection methods (Maximum Fisher Information, a-stratification, Likelihood Weight Information Criterion, Gradual Information Ratio, and Kullback-Leibler), ability estimation methods (Maximum Likelihood Estimation, Expected a Posteriori Distribution), and test termination rules (40 items, SE &lt; .20 and SE &lt; .40). According to the fixed test-length stopping rule, the SE values that were obtained by using the Maximum Likelihood Estimation method were found to be higher than the SE values that were obtained by using the Expected a Posteriori Distribution ability estimation method. When ability estimation was Maximum Likelihood, the highest SE value was obtained from a-stratification item selection method when the test length is smaller then 30. Whereas, Kullback-Leibler item selection method yielded the highest SE value when the test length is larger then 30. According to Expected a Posteriori ability estimation method, the highest SE value was obtained from a-stratification item selection method in all test lengths. In the conditions where test termination rule was SE &lt; .20, and Maximum Likelihood Ability Estimation method was used, the lowest and highest average number of items were obtained from the Gradual Information Ratio and Maximum Fisher Information item selection method, respectively. Furthermore, when the SE is lower than .20 and Expected a Posteriori ability estimation method was utilized, the lowest average number of items was obtained through Kullback-Leibler, and the highest was obtained through Likelihood Weight Information Criterion item selection method. In the conditions where the test termination rule was SE &lt; .40, and ability estimation method was Maximum Likelihood Estimation, the maximum and minimum number of items were obtained by using Maximum Fisher Information and Kullback-Leibler item selection methods respectively. Additionally, when Expected a Posteriori ability estimation was used, the maximum and minimum number of items were obtained via Maximum Fisher Information and a-stratification item selection methods. For the cases where the stopping rule was SE &lt; .20 and SE &lt; .40 and Maximum Likelihood Estimation method was used, the average number of items were found to be highest in all item selection methods.

Highlights

This study aims to answer the following questions: 1) How do standard errors in relation to the methods used in item selection (Maximum Fisher Information, a-stratification, Likelihood Weight Information Criterion, Gradual Information Ratio, and Kullback-Leibler) differ in terms of a) test length (5, 10, 20, 30 and 40 items) b) ability estimation (Maximum Likelihood and Expected a Posteriori) methods?
To determine how standard error associated with different item selection methods (MFI, Gradual Information Ratio (GIR), Likelihood Weight Information Criterion (LWIC), a-stratification, KL), the test length (5, 10, 20, 30 and 40 items) and ability estimation methods (MLE and Expected a Posteriori (EAP)), mean of the interim ability estimations (θ) were used in the analysis of the results
In the beginning of Computerized Adaptive Test (CAT) conditions, where Maximum Likelihood Estimation (MLE) ability estimation method used, the lowest standard error (SE) value was obtained from the GIR item selection method after five items administered (n < 5). a-stratification item selection method showed the highest SE value while the test length is shorter than 30 items (n < 30), and KL showed the highest SE value while the test length is longer than 30 items (n > 30)

Summary

Introduction

Computerized Adaptive Test (CAT) algorithm consists of applying selected items to the examinee in computer environment, estimating examinee ability level through given responses, selecting new items according to the most recent estimated ability, and administrating test until the specified test termination rule is conducted (Orcutt, 2002; Thissen & Mislevy, 2000; Wainer, 2000; Weiss, 1983).The key questions for CAT are (Wainer, 2000);- How is the first item selected to start the test?To cite this article: Sulak, S. & Kelecioğlu, H. (2019). Computerized Adaptive Test (CAT) algorithm consists of applying selected items to the examinee in computer environment, estimating examinee ability level through given responses, selecting new items according to the most recent estimated ability, and administrating test until the specified test termination rule is conducted (Orcutt, 2002; Thissen & Mislevy, 2000; Wainer, 2000; Weiss, 1983). - How is the first item selected to start the test?. Investigation of item selection methods according to test termination rules in CAT applications. There are different methods for selecting the first item to start testing. The most commonly used ability estimation methods in CAT applications are Maximum Likelihood and Bayesian Based Estimation. The major item selection methods used in CAT applications are Maximum Fisher Information (MFI), a-stratification, Likelihood Weight Information Criterion (LWIC), Gradual Information Ratio (GIR) and Kullback-Leibler (KL).

Objectives

Methods

Results

Conclusion