Anchor Test Design Research Articles

ABSTRACTUsing self‐reported but empirically verified repeater groups, we analyzed vast amounts of real test data across a wide range of administrations from a graduate admissions examination that was administered in a non‐English language to investigate repeater effects on score equating using the nonequivalent groups with anchor test (NEAT) design. Both linear and nonlinear equating models were considered in deriving the equating functions for various study groups. We evaluated scaled score differences between equating in the total group, the repeater group, and the first‐timer group using statistics of simple differences and subpopulation invariance measures developed and used widely in the last 10 years. Standard errors of statistics summarizing scaled score differences were estimated using a simulation approach to provide statistical criteria for evaluating the significance of equating differences. In addition, we used scaled score differences that were critical to admissions screening as criteria for evaluating the practical significance of equating differences. To put the investigation of repeater effects in proper perspective, we analyzed the repeater data for an in‐depth understanding of repeater performance trends. Overall, we found no significant effects of repeater performance on score equating for the exam being studied. Although many of the equating differences were practically significant, most of the practically significant differences were not statistically significant. However, further research with larger repeater samples is recommended to help explain the practical significance of equating differences consistently observed in this study for the repeater group. Potential problems associated with small repeater study sample sizes, issues of the practical criterion for evaluating the significance of equating differences, and study limitations are also discussed.

ABSTRACTThe nonequivalent groups with anchor test (NEAT) design involves missing data that are missing by design. Three popular equating methods that can be used with a NEAT design are the poststratification equating method, the chain equipercentile equating method, and the item‐response‐theory observed‐score‐equating method. These three methods each make different assumptions about the missing data in the NEAT design. Though studies have compared the equating performance of the three methods under the NEAT design, none has examined the missing data assumptions and their implications for such comparisons. The missing data assumptions can affect equating studies because it is necessary to fill in the missing data or their distribution in some way in order to have a true, or criterion, equating function to compare the accuracy and bias of the different methods. If the missing data or their distribution are filled in using missing data assumptions that correspond to a given method, that may favor that method in any comparison with the others. This paper first describes the missing data assumptions of the three equating methods and then performs a fair comparison of the 3 methods using data from 3 different operational tests. For each data set, we examine how the 3 equating methods perform when the missing data satisfy the assumptions made by only 1 of these equating methods. The chain equating method is somewhat more satisfactory overall than the other methods in our fair comparison of the methods; hence, we recommend that equating practitioners seriously consider the chain equating method when using the NEAT design. In addition, we conclude that the results from the different equating methods will tend to agree with each other when proper equating conditions are in place. Moreover, to uncover problems that might not reveal themselves otherwise, it is important for operational testing programs to apply multiple equating methods and study the differences among their results.

Anchor Test Design Research Articles

Related Topics

Articles published on Anchor Test Design

REPEATER EFFECTS ON SCORE EQUATING FOR A GRADUATE ADMISSIONS EXAM

Equating Subscores under the Nonequivalent Anchor Test (NEAT) Design

Observed Score Equating Using Discrete and Passage‐Based Anchor Items

A New Approach to Comparing Several Equating Methods in the Context of the NEAT Design

Local Observed-Score Equating With Anchor-Test Designs

New Equating Methods and Their Relationships with Levine Observed Score Linear Equating Under the Kernel Equating Framework

SINGLE‐ VERSUS DOUBLE‐SCORING OF TREND RESPONSES IN TREND SCORE EQUATING WITH CONSTRUCTED‐RESPONSE TESTS

A Single Population Litmus Test for Linear Scale Alignment Methods: Commentary on Kane, Mroch, Suh, and Ripkey

Accumulative Equating Error after a Chain of Linear Equatings

The Missing Data Assumptions of the NEAT Design and their Implications for Test Equating

An Evaluation of Five Linear Equating Methods for the NEAT Design

An Empirical Comparison of Five Linear Equating Methods for the NEAT Design

Linear Equating for the NEAT Design: Parameter Substitution Models and Chained Linear Relationship Models

EVALUATING SUBPOPULATION INVARIANCE OF LINKING FUNCTIONS TO DETERMINE THE ANCHOR COMPOSITION FOR A MIXED‐FORMAT TEST

EFFECT OF REPEATERS ON SCORE EQUATING IN A LARGE‐SCALE LICENSURE TEST

COMPARISON OF THE EFFECTS OF DISCRETE ANCHOR ITEMS AND PASSAGE‐BASED ANCHOR ITEMS ON OBSERVED‐SCORE EQUATING RESULTS

THE MISSING DATA ASSUMPTIONS OF THE NONEQUIVALENT GROUPS WITH ANCHOR TEST (NEAT) DESIGN AND THEIR IMPLICATIONS FOR TEST EQUATING

CONSTRUCTION OF CHAINED TRUE SCORE EQUIPERCENTILE EQUATINGS UNDER THE KERNEL EQUATING (KE) FRAMEWORK AND THEIR RELATIONSHIP TO LEVINE TRUE SCORE EQUATING

DEVELOPMENT OF APPROXIMATIONS TO POPULATION INVARIANCE INDICES

Small‐Sample Equating Using a Synthetic Linking Function

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Anchor Test Design Research Articles

Related Topics

Articles published on Anchor Test Design

REPEATER EFFECTS ON SCORE EQUATING FOR A GRADUATE ADMISSIONS EXAM

Equating Subscores under the Nonequivalent Anchor Test (NEAT) Design

Observed Score Equating Using Discrete and Passage‐Based Anchor Items

A New Approach to Comparing Several Equating Methods in the Context of the NEAT Design

Local Observed-Score Equating With Anchor-Test Designs

New Equating Methods and Their Relationships with Levine Observed Score Linear Equating Under the Kernel Equating Framework

SINGLE‐ VERSUS DOUBLE‐SCORING OF TREND RESPONSES IN TREND SCORE EQUATING WITH CONSTRUCTED‐RESPONSE TESTS

A Single Population Litmus Test for Linear Scale Alignment Methods: Commentary on Kane, Mroch, Suh, and Ripkey

Accumulative Equating Error after a Chain of Linear Equatings

The Missing Data Assumptions of the NEAT Design and their Implications for Test Equating

An Evaluation of Five Linear Equating Methods for the NEAT Design

An Empirical Comparison of Five Linear Equating Methods for the NEAT Design

Linear Equating for the NEAT Design: Parameter Substitution Models and Chained Linear Relationship Models

EVALUATING SUBPOPULATION INVARIANCE OF LINKING FUNCTIONS TO DETERMINE THE ANCHOR COMPOSITION FOR A MIXED‐FORMAT TEST

EFFECT OF REPEATERS ON SCORE EQUATING IN A LARGE‐SCALE LICENSURE TEST

COMPARISON OF THE EFFECTS OF DISCRETE ANCHOR ITEMS AND PASSAGE‐BASED ANCHOR ITEMS ON OBSERVED‐SCORE EQUATING RESULTS

THE MISSING DATA ASSUMPTIONS OF THE NONEQUIVALENT GROUPS WITH ANCHOR TEST (NEAT) DESIGN AND THEIR IMPLICATIONS FOR TEST EQUATING

CONSTRUCTION OF CHAINED TRUE SCORE EQUIPERCENTILE EQUATINGS UNDER THE KERNEL EQUATING (KE) FRAMEWORK AND THEIR RELATIONSHIP TO LEVINE TRUE SCORE EQUATING

DEVELOPMENT OF APPROXIMATIONS TO POPULATION INVARIANCE INDICES

Small‐Sample Equating Using a Synthetic Linking Function