Item-Score Reliability as a Selection Tool in Test Construction.

Eva A O Zijlmans,Jesper Tijmstra,Klaas Sijtsma,L Andries Van Der Ark

doi:10.3389/fpsyg.2018.02298

Eva A O Zijlmans, Jesper Tijmstra + Show 2 more

Open Access

https://doi.org/10.3389/fpsyg.2018.02298

Copy DOI

Abstract

This study investigates the usefulness of item-score reliability as a criterion for item selection in test construction. Methods MS, λ6, and CA were investigated as item-assessment methods in item selection and compared to the corrected item-total correlation, which was used as a benchmark. An ideal ordering to add items to the test (bottom-up procedure) or omit items from the test (top-down procedure) was defined based on the population test-score reliability. The orderings the four item-assessment methods produced in samples were compared to the ideal ordering, and the degree of resemblance was expressed by means of Kendall's τ. To investigate the concordance of the orderings across 1,000 replicated samples, Kendall's W was computed for each item-assessment method. The results showed that for both the bottom-up and the top-down procedures, item-assessment method CA and the corrected item-total correlation most closely resembled the ideal ordering. Generally, all item assessment methods resembled the ideal ordering better, and concordance of the orderings was greater, for larger sample sizes, and greater variance of the item discrimination parameters.

Highlights

Measurements obtained by tests are only trustworthy if the quality of the test meets certain standards
CA and the corrected item-total correlation showed the highest mean τ -values, meaning that these item-assessment methods resembled the ordering based on ρXX′ best
This study investigated the usefulness of item-score reliability methods to select items with the aim to produce either a longer or a shorter test

Summary

Introduction

Measurements obtained by tests are only trustworthy if the quality of the test meets certain standards. When adapting an existing test, the test constructor may wish to increase or decrease the number of items for various reasons. The existing test may be too short, resulting in test-score reliability that is too low. In this case, adding items to the test may increase test-score reliability. Test constructors could use the reliability of individual items to make decisions about the items to add to the test or to remove from the test. This article investigates the usefulness of item-score reliability methods for making informed decisions about items to add or remove when adapting a test

Methods

Results

Conclusion