Abstract

ABSTRACTThe original purpose of this study was to address the test‐disclosure‐related need to introduce more Graduate Record Examinations (GRE) General Test editions each year than formerly, in a context of stable, or possibly declining examinee volume. The legislative conditions that created this initial concern regarding test equating have abated. However, several of the test equating models considered in this research might provide other advantages to the GRE Program. These potential advantages are listed in the body of the report.Equating can be considered to consist of three parts: (1) a data collection design, (2) an operational definition of the equating transformation, and (3) the specific statistical estimation techniques used to obtain the equating transformation. Currently, the GRE General Test collects data using an equivalent groups design. Typically, a linear equating method is used, and the specific estimation technique is setting means and standard deviations equal.For this research, two other data collection designs were studied: nonrandom group, external anchor test, and random group, preoperational section. Both item response theory (IRT) and linear equating definitions were used. IRT true score equating was based on item statistics for the three‐parameter logistic model as estimated using LOGIST. Linear models included section pre‐equating using the EM algorithm, Tucker's observed score model, and several true score models developed by Tucker and Levine. For each of the three GRE measures, verbal, quantitative and analytical, all equating methods were assessed for bias and root mean squared error by equating a test edition to itself through a chain with six equating links.Bias and root mean squared error were extremely large for equating the verbal and analytical measures using section pre‐equating or IRT equating with data based on the random group preoperational section data collection design. For the quantitative measure, this data collection design produced a small amount of bias, but moderate amount of root mean squared error.Using the nonrandom group, external anchor test data collection design, quantitative equatings had moderate amounts of both bias and root mean squared error. Verbal nonrandom group, external anchor test equatings showed relatively small amounts of bias and root mean squared error, with the Tucker observed score model performing particularly well. Bias was small for the analytical anchor test equatings, and root mean squared error ranged from small to moderate.All nonrandom group, external anchor test methods worked about as well in practice for the verbal measure as the currently used random group method does in theory. The current random group method, however, has never been subjected to an empirical check comparable to that used in this study for the experimental equating methods. Two anchor test methods, Tucker 2 True and Levine, appear to have worked as well in practice for the analytical measure as the random group method does in theory.A possible explanation for the generally poor results for the random group, preoperational section data collection design based equatings was the constant use of the last section of the test to collect equating data. It may be, now that the sections of the GRE General Test are administered in various orders in different editions of the test, that the extreme bias found in this study for the verbal and analytical random group preoperational section equatings will disappear or at least be substantially diminished.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call