Abstract

AbstractThe author distinguishes between the clinical and statistical meaning of varying levels of intertaster reliability for the 11 judges who evaluated 10 Chardonnays (6 American and 4 French) in the heralded 1976 Paris wine competition. Four wines showed levels of weighted kappa values (<0.40), that are considered poor by established biostatistical criteria. These ranged between 0.10, for the French Beaune Clos des Mouches 1973 Chardonnay to 0.33 for the U.S. Veedercrest 1972 Chardonnay. However, when levels of statistical significance of the weighted kappa (Kw) values were obtained, only the Clos des Mouches failed to reach statistical significance at the .05 level. The other three wines-the U.S. Chateau Montelena, 1973, with a Kwof 0.20; the U.S. 1973 David Bruce regular, with a weighted kappa value of .27 and the U.S. Veedercrest, with one of .33-reached statistical significance at p values of <.05, <.001, and <.0001, respectively. These findings are not weighted kappa specific, and reveal that when sample sizes are large enough, even the most trivial of results will be statistically significant, while often devoid of practical or clinical meaning-fulness. A level of Kwthat is clinically meaningful will most likely be statistically significant. But high levels of statistical significance are no guarantee of clinical significance. Methods for resolving this “big N phenomenon” are presented and discussed. (JEL Classification: C12, C49)

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call