High Impact = High Statistical Standards? Not Necessarily So

Geoff Cumming,Patrizio E. Tressoldi,David Giofré,Francesco Sella

doi:10.1371/journal.pone.0056180

Abstract

What are the statistical practices of articles published in journals with a high impact factor? Are there differences compared with articles published in journals with a somewhat lower impact factor that have adopted editorial policies to reduce the impact of limitations of Null Hypothesis Significance Testing? To investigate these questions, the current study analyzed all articles related to psychological, neuropsychological and medical issues, published in 2011 in four journals with high impact factors: Science, Nature, The New England Journal of Medicine and The Lancet, and three journals with relatively lower impact factors: Neuropsychology, Journal of Experimental Psychology-Applied and the American Journal of Public Health. Results show that Null Hypothesis Significance Testing without any use of confidence intervals, effect size, prospective power and model estimation, is the prevalent statistical practice used in articles published in Nature, 89%, followed by articles published in Science, 42%. By contrast, in all other journals, both with high and lower impact factors, most articles report confidence intervals and/or effect size measures. We interpreted these differences as consequences of the editorial policies adopted by the journal editors, which are probably the most effective means to improve the statistical practices in journals with high or low impact factors.

Highlights

Scientific papers published in journals with the highest impact factor (IF) are selected after a severe examination by peer reviews, which assess their scientific value and methodological quality
In the highest Impact Factor (HIF) journals this practice is used in 89% of articles published in Nature, in 42% of articles published in Science whereas it is used only in 14% and 7% of articles published in New England Journal of Medicine (NEJM) and The Lancet respectively
The estimation of prospective statistical power in HIF journals ranges from 0% in Science to 66% in The Lancet, whereas in lower IF (LIF) journals, it ranges from 1% of articles published in the American Journal of Public Health (AJPH) to 23% of articles published in the Journal of Experimental Psychology –Applied (JEP-A)

Summary

Introduction

Scientific papers published in journals with the highest impact factor (IF) are selected after a severe examination by peer reviews, which assess their scientific value and methodological quality. Assessing the statistical methods used is an important part of judging methodological quality. In Life and Behavioral Sciences, null hypothesis significance testing (NHST) is very often used, even though many scholars have, since the 1960s [1], identified its limited ability to answer the questions researchers ask and described damaging errors researchers commit when using it. NHST starts by assuming that a null hypothesis, H0, is true, where H0 is typically a statement of zero effect, zero difference, or zero correlation in the population of interest. A p value is calculated, where p is the probability, if H0 is true, of obtaining the observed result, or more extreme. A low p value, typically p,.05, throws doubt on H0 and leads to the rejection of H0 and a conclusion that the effect in question is statistically significant. Statistical power is defined only in the context of NHST, but even so, we regard use of prospective statistical power—the calculation of power before collecting data, usually to guide choice of N—as an advance, because such use can help avoid some of the pitfalls of NHST

Objectives

Methods

Results

Conclusion