Placing Multiple Tests on a Common Scale Using a Post-test Anchor Design: Effects of Item Position and Order on the Stability of Parameter Estimates

Davide Marengo,Michele Settanni,Renato Miceli,Rosalba Rosato

doi:10.3389/fams.2018.00050

Abstract

When there is an interest in tracking longitudinal trends of student educational achievement using standardized tests, the most common linking approach generally involves the inclusion of a common set of items across adjacent test administrations. However, this approach may not be feasible in the context of high-stakes testing due to undesirable exposure of administered items. In this paper, we propose an alternative design, which allows for the equating of multiple operational tests with no items in common based on the inclusion of common items in an anchor test administered in a post-test condition. We tested this approach using data from the assessment program implemented in Italy by the National Institute for the Educational Evaluation of Instruction and Training for the years 2010-2012, and from a convenience sample of 832 8th grade students. Additionally, we investigated the impact on functioning of common items of varying item position and orders across test forms. Linking of tests was performed using multiple-group Item Response Theory modeling. Results of linking indicated that operational tests showed little variation in difficulty over the years. Investigation of item position and order effects showed that changes in item position closer to the end of the test, as well as the positioning of difficult items at the beginning or in the middle section of a test lead to a significant increase in difficulty of common items. Overall, findings indicate that this approach represents a viable linking design, which can be useful when the inclusion of common items across operational tests is not possible. The impact of differential item functioning of common items on equating error and the ability to detect ability trends is discussed.

Highlights

When there is an interest in tracking longitudinal trends of student educational achievement at the population level, the most common approach employed by national and international LargeScale Assessment Programs (LSAPs)–such as the PISA [1] and IEA Trends in Mathematical and Science Study (TIMSS, [2]) programs–is to include a common set of items across adjacentItem-Order Effects in Post-test Equating test administrations
The implementation of longitudinal NonEquivalent groups Anchor Test (NEAT) designs, may not be feasible in the context of high-stakes testing due to test security concerns related to the inclusion of common items across multiple operational test forms, in particular when the common items are required to contribute to test scores
The cross-plots in Figures 2–4 provide a visual representation of the results: overall, the difficulty parameters for the 2010 and 2011 items showed good stability across test conditions, the majority of the items being located near the identity line

Summary

Introduction

When there is an interest in tracking longitudinal trends of student educational achievement at the population level, the most common approach employed by national and international LargeScale Assessment Programs (LSAPs)–such as the PISA [1] and IEA Trends in Mathematical and Science Study (TIMSS, [2]) programs–is to include a common set of items across adjacentItem-Order Effects in Post-test Equating test administrations. Based on examinees’ responses on the common-item set, equating procedures, such as those provided in the framework of Item Response Theory (IRT; [3,4,5])–can be implemented on collected data to estimate the equating parameters required to put the operational tests, and the population ability estimates, on a common metric scale. This equating approach, which is generally referred to as the NonEquivalent groups Anchor Test (NEAT) design, is one of the most popular and flexible tools for linking examinations in educational LSAPs [6]. One advantage of using the post-equating approach over the pre-equating design is that its allows the security of all items to be preserved as the administration of allowing equating takes place post-hoc, and all items included in the operational test are allowed to contribute to the examinees’ scores

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Applied Mathematics and Statistics	Publication Date: Oct 30, 2018
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Placing Multiple Tests on a Common Scale Using a Post-test Anchor Design: Effects of Item Position and Order on the Stability of Parameter Estimates

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Applied Mathematics and Statistics

Lead the way for us

Similar Papers

Differential item functioning of the PROMIS physical function, pain interference, and pain behavior item banks across patients with different musculoskeletal disorders and persons from the general population.
Martine H P Crins ... Wouter Schuller
Quality of Life Research | VOL. 28
Martine H P Crins, et. al.Martine H P Crins ... Wouter Schuller
02 Jan 2019
Quality of Life Research | VOL. 28

Impact of differential item functioning on group score reporting in the context of large-scale assessments
Sean Joo ... Usama Ali
Large-scale Assessments in Education | VOL. 10
Sean Joo, et. al.Sean Joo ... Usama Ali
15 Nov 2022
Large-scale Assessments in Education | VOL. 10

Examining the Impact of Differential Item Functioning on Classification Accuracy in Cognitive Diagnostic Models.
Justin Paulsen ... Dubravka Svetina
Applied Psychological Measurement | VOL. 44
Justin Paulsen, et. al.Justin Paulsen ... Dubravka Svetina
04 Jul 2019
Applied Psychological Measurement | VOL. 44

Evaluation of measurement equivalence of the Family Satisfaction with the End-of-Life Care in an ethnically diverse cohort: Tests of differential item functioning
Jeanne A Teresi ... Katherine Ornstein
Palliative Medicine | VOL. 29
Jeanne A Teresi, et. al.Jeanne A Teresi ... Katherine Ornstein
26 Aug 2014
Palliative Medicine | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Placing Multiple Tests on a Common Scale Using a Post-test Anchor Design: Effects of Item Position and Order on the Stability of Parameter Estimates

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Applied Mathematics and Statistics