Abstract

When there is an interest in tracking longitudinal trends of student educational achievement using standardized tests, the most common linking approach generally involves the inclusion of a common set of items across adjacent test administrations. However, this approach may not be feasible in the context of high-stakes testing due to undesirable exposure of administered items. In this paper, we propose an alternative design, which allows for the equating of multiple operational tests with no items in common based on the inclusion of common items in an anchor test administered in a post-test condition. We tested this approach using data from the assessment program implemented in Italy by the National Institute for the Educational Evaluation of Instruction and Training for the years 2010-2012, and from a convenience sample of 832 8th grade students. Additionally, we investigated the impact on functioning of common items of varying item position and orders across test forms. Linking of tests was performed using multiple-group Item Response Theory modeling. Results of linking indicated that operational tests showed little variation in difficulty over the years. Investigation of item position and order effects showed that changes in item position closer to the end of the test, as well as the positioning of difficult items at the beginning or in the middle section of a test lead to a significant increase in difficulty of common items. Overall, findings indicate that this approach represents a viable linking design, which can be useful when the inclusion of common items across operational tests is not possible. The impact of differential item functioning of common items on equating error and the ability to detect ability trends is discussed.

Highlights

  • When there is an interest in tracking longitudinal trends of student educational achievement at the population level, the most common approach employed by national and international LargeScale Assessment Programs (LSAPs)–such as the PISA [1] and IEA Trends in Mathematical and Science Study (TIMSS, [2]) programs–is to include a common set of items across adjacentItem-Order Effects in Post-test Equating test administrations

  • The implementation of longitudinal NonEquivalent groups Anchor Test (NEAT) designs, may not be feasible in the context of high-stakes testing due to test security concerns related to the inclusion of common items across multiple operational test forms, in particular when the common items are required to contribute to test scores

  • The cross-plots in Figures 2–4 provide a visual representation of the results: overall, the difficulty parameters for the 2010 and 2011 items showed good stability across test conditions, the majority of the items being located near the identity line

Read more

Summary

Introduction

When there is an interest in tracking longitudinal trends of student educational achievement at the population level, the most common approach employed by national and international LargeScale Assessment Programs (LSAPs)–such as the PISA [1] and IEA Trends in Mathematical and Science Study (TIMSS, [2]) programs–is to include a common set of items across adjacentItem-Order Effects in Post-test Equating test administrations. Based on examinees’ responses on the common-item set, equating procedures, such as those provided in the framework of Item Response Theory (IRT; [3,4,5])–can be implemented on collected data to estimate the equating parameters required to put the operational tests, and the population ability estimates, on a common metric scale. This equating approach, which is generally referred to as the NonEquivalent groups Anchor Test (NEAT) design, is one of the most popular and flexible tools for linking examinations in educational LSAPs [6]. One advantage of using the post-equating approach over the pre-equating design is that its allows the security of all items to be preserved as the administration of allowing equating takes place post-hoc, and all items included in the operational test are allowed to contribute to the examinees’ scores

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call