Evaluating the Effects of Differences in Group Abilities on the Tucker and the Levine Observed-Score Methods for Common-Item Nonequivalent Groups Equating

Hanwei Chen,Rongchun Zhu,Zhongmin Cui,Xiaohong Gao

doi:10.1037/e548302011-001

Abstract

The most critical feature of a common-item nonequivalent groups equating design is that the average score difference between the new and old groups can be accurately decomposed into a group ability difference and a form difficulty difference. Two widely used observed-score linear equating methods, the Tucker and the Levine observed-score methods, have different statistical assumptions when decomposing the score difference. Variation in the decomposition of group ability and form difficulty differences can affect the equating results. This study confirmed previous findings in the literature that when form and group differences are small, both equating methods produce similar results. When the group ability difference is large, however, the Levine observed-score method produces more accurate equating results than the Tucker method. The results indicated that the Levine observed-score method not only decomposes form and group differences more accurately, but also yields smaller unweighted absolute equating differences and average weighted root mean square differences. This study showed that the Levine observed-score method is also robust to the form difference. Evaluating the Effects of Differences in Group Abilities on the Tucker and the Levine Observed-Score Methods for Common-ltcm Nonequivalent Groups Equating Introduction A common-item nonequivalent groups equating design is often used in many testing programs because of its flexibility in data collection. Important features of this design include: (1) cach of the two examinee groups (new and old) is only required to take one alternative form of the test; (2) a set of common items is embedded in both the new and old forms, which links the two forms of the test; and (3) the common-item set should be viewed as a short version of the full-length test, which requires similar content and statistical specifications (including difficulty). Among the applicable equating methods under the common-item nonequivalent groups design, two observed-score linear equating methods are of particular interest: the Tucker and the Levine observed-score equating methods. Because each examinee only takes one alternative form of the test, strong statistical assumptions are necessary in establishing the linear equating function for the new and old forms. Two statistical assumptions about the observed scores are made for the Tucker equating method: linear regression and conditional variances. The linear regression assumption indicates that the regression of the total scores on the common-item scores is the same for both the new and old populations. The conditional variances assumption requires that the conditional variances of the total scores given the common-item scores are the same in both populations. On the other hand, three statistical assumptions are made for the Levine observed-score equating method: correlational assumptions, linear regression assumptions, and error variance assumptions. The correlational assumptions specify that the true scores for the forms and the common-items are perfectly correlated in the new and old populations. The linear regression assumptions mean that the regressions of the true scores for the new form (or old form) on the true scores for the common-items are the same for both the new and old populations. Furthermore, the error variance assumption means that the measurement error variances for the new form, old form, or common-items are the same for both the new and old populations (see Kolen and Brennan, 2004, pp. 105-117 for details). When all the assumptions are satisfied, research has indicated that both equating methods will produce the same results (Kolen, 1990; von Davier, 2008). von Davier (2008) indicates that the Tucker and the Levine observed-score methods can produce theoretically the same equating results when the populations are the same and all assumptions for both equating methods are satisfied. Kolen (1990) suggests that if the two populations are similar in ability and the common-item scores are highly correlated with the total scores on the two forms of the test in a common-item nonequivalent groups design, all equating methods tend to produce the same results. Further, when comparing the Tucker and the Levine observed-score equating methods both empirically and theoretically, Kolen and Brennan (2004, pp. 128-129) suggested that the equating decisions favor (a) the Tucker method, when both examinee groups are similar in ability; (b) the Levine observed-score method, when the examinee groups are dissimilar in ability; or (c) not conducting equating, if the examinee groups are very different in ability or the forms arc too much dissimilar in difficulty. However, in practical situations, both form and group differences can exist in an equating and the magnitudes of the differences may vary. Therefore, under the common-item nonequivalent groups design, the interaction between examinee group difference and form difference is crucial to the equating results based on the different equating

Full Text