Dr. George Burket and I appear agree on the existence of a problem in the calculation and interpretation of group scores on norm-referenced tests, but we part company on the issue of that problem's causes and solutions. His comment (Burket, 1987) contains some fine measurement theory and insightful observations, particularly in the final three paragraphs. However, these paragraphs are in large part irrelevant the one point in the article that Burket especially controverts, namely my suggestion that publishers' scaled score development procedures may be largely responsible for the sorts of problems identified. Other portions of the comment, which are relevant this controverted point, challenge ideas that I offered hypotheses and as a basis for (Baglin, 1986, p. 65) in a section titled Why Does This Phenomenon Occur? However, nothing in the comment has caused me retract or modify this hypothesis or, for that matter, any of the original article. Quite the contrary, the absence in the comment of any substantive discussion, much less rebuttal, of my hypothesized cause and suggested solutions leads me strengthen my contention that a large share at least of the culpability for this problem lies precisely with the publishers' scaled score development procedures. Four points in the comment should be clarified. First, the statement is made that in the normative distributions makes possible the problem described at the outset in Baglin's paper, where a local mean falls below the national mean yet is well 'above average' in terms of percentiles (i.e., above the median) (Burket, 1987, pp. 175-176). This could be, theoretically, a full or partial explanation of the phenomenon I cited if the original raw score normative distribution had been positively skewed. But, as is noted in the article, that distribution was negatively skewed, thus making Burket's statement untenable. Second, it is stated the the apparent discrepancies ... noted by Baglin ... result from basic mathematical facts (Burket, 1987, p. 175), one of which is identified as the asymmetry issue discussed above. Because the asymmetry explanation is inaccurate, it follows that this further contention is also inaccurate. Third, the comment contends that it is possible come close in an effort to simultaneously transform two or more distributions perfect normality on the same baseline (Burket, 1987, p. 175) and cites my Table 7 as evidence. This is selective use of data. Tables 4, 8, and 9 show analogous maximum discrepancies of up 10 times the magnitude of the one chosen as an example. It should not be inferred or implied that the Table 7 situation is typical or even common. Fourth, the comment's Table 1 and associated discussion deal exclusively with individual scores, not with group scores, which are the topic of my article. Furthermore, the percentile and NCE derived scores therein are straightforward computations based on the raw scores-a variant of what I suggested in the original article's
Read full abstract