Abstract

The Baglin paper (1986) does a service in drawing attention to the widely shared confusion surrounding the calculation and interpretation of group scores on norm-referenced tests. The realities underlying this confusion have important practical implications and involve difficult and fundamental psychometric issues. The paper does a disservice in offering a fallacious and misleading discussion of the source of the perceived problem and its possible solutions. In this response, my intention is to characterize the source of the perceived problem, to point out some of the fallacies in the Baglin paper, and to give an opinion on solutions. The basic concern that Baglin addresses is that the many different methods available for calculating a group summary score can give substantially different results. These differences can lead to contradictory interpretations when making group comparisons. Two types of apparent discrepancies are noted by Baglin: (a) those that arise when means are calculated using different units, and (b) those between medians and means. Baglin's major fallacy is the inference that these apparent discrepancies are due to inappropriate scaling procedures. As will be explained, both differences result from basic mathematical facts, so that no conceivable change in publishers' scaling procedures would eliminate them. The source of the first type of difference is the mathematical fact that the mean calculated in one scale and then converted to another scale will not necessarily equal the result when the scores are first converted and the mean then calculated using the converted scores. Of course, it is true that the results would agree if one scale could be obtained as a linear transformation of the other, but the score scales in common use-percentiles, normal curve equivalents (NCEs), grade equivalents (GEs), raw scores, and scale scores-are all nonlinearly related to one another. (The relationship between NCEs and Thurstone scale scores is linear only in theory, since in practice it is not possible to simultaneously transform two or more distributions to perfect normality on the same baseline. That it is possible to come close may be inferred from Baglin's Table 7, where the maximum discrepancy between the percentile for the mean California Achievement Tests scaled score [CAT SS] and the percentile for the mean NCE is 0.8.) The source of the second type of difference is the mathematical fact that means and medians need not agree and, in general, will not agree unless the score distribution is symmetric. Although it is true that the score distribution for the publisher's norm group is symmetric by definition for such derived scores as percentiles and NCEs, local distributions will not in general be strictly symmetric. The confusion becomes more acute where the normative distributions are skewed, as will in general be the case for GEs, raw scores, and scale scores based

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call