Abstract
AbstractThe author distinguishes between the clinical and statistical meaning of varying levels of intertaster reliability for the 11 judges who evaluated 10 Chardonnays (6 American and 4 French) in the heralded 1976 Paris wine competition. Four wines showed levels of weighted kappa values (<0.40), that are considered poor by established biostatistical criteria. These ranged between 0.10, for the French Beaune Clos des Mouches 1973 Chardonnay to 0.33 for the U.S. Veedercrest 1972 Chardonnay. However, when levels of statistical significance of the weighted kappa (Kw) values were obtained, only the Clos des Mouches failed to reach statistical significance at the .05 level. The other three wines-the U.S. Chateau Montelena, 1973, with a Kwof 0.20; the U.S. 1973 David Bruce regular, with a weighted kappa value of .27 and the U.S. Veedercrest, with one of .33-reached statistical significance at p values of <.05, <.001, and <.0001, respectively. These findings are not weighted kappa specific, and reveal that when sample sizes are large enough, even the most trivial of results will be statistically significant, while often devoid of practical or clinical meaning-fulness. A level of Kwthat is clinically meaningful will most likely be statistically significant. But high levels of statistical significance are no guarantee of clinical significance. Methods for resolving this “big N phenomenon” are presented and discussed. (JEL Classification: C12, C49)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.