The continuing misuse of null hypothesis significance testing in biological anthropology.

Richard J Smith

doi:10.1002/ajpa.23399

Abstract

There is over 60 years of discussion in the statistical literature concerning the misuse and limitations of null hypothesis significance tests (NHST). Based on the prevalence of NHST in biological anthropology research, it appears that the discipline generally is unaware of these concerns. The p values used in NHST usually are interpreted incorrectly. A p value indicates the probability of the data given the null hypothesis. It should not be interpreted as the probability that the null hypothesis is true or as evidence for or against any specific alternative to the null hypothesis. P values are a function of both the sample size and the effect size, and therefore do not indicate whether the effect observed in the study is important, large, or small. P values have poor replicability in repeated experiments. The distribution of p values is continuous and varies from 0 to 1.0. The use of a cut-off, generally p ≤ 0.05, to separate significant from nonsignificant results, is an arbitrary dichotomization of continuous variation. In 2016, the American Statistical Association issued a statement of principles regarding the misinterpretation of NHST, the first time it has done so regarding a specific statistical procedure in its 180-year history. Effect sizes and confidence intervals, which can be calculated for any data used to calculate p values, provide more and better information about tested hypotheses than p values and NHST.

Full Text