Abstract

P values represent a widely used, but pervasively misunderstood and fiercely contested method of scientific inference. Display items, such as figures and tables, often containing the main results, are an important source of P values. We conducted a survey comparing the overall use of P values and the occurrence of significant P values in display items of a sample of articles in the three top multidisciplinary journals (Nature, Science, PNAS) in 2017 and, respectively, in 1997. We also examined the reporting of multiplicity corrections and its potential influence on the proportion of statistically significant P values. Our findings demonstrated substantial and growing reliance on P values in display items, with increases of 2.5 to 14.5 times in 2017 compared to 1997. The overwhelming majority of P values (94%, 95% confidence interval [CI] 92% to 96%) were statistically significant. Methods to adjust for multiplicity were almost non-existent in 1997, but reported in many articles relying on P values in 2017 (Nature 68%, Science 48%, PNAS 38%). In their absence, almost all reported P values were statistically significant (98%, 95% CI 96% to 99%). Conversely, when any multiplicity corrections were described, 88% (95% CI 82% to 93%) of reported P values were statistically significant. Use of Bayesian methods was scant (2.5%) and rarely (0.7%) articles relied exclusively on Bayesian statistics. Overall, wider appreciation of the need for multiplicity corrections is a welcome evolution, but the rapid growth of reliance on P values and implausibly high rates of reported statistical significance are worrisome.

Highlights

  • The long-standing controversy over how best to make inferences from empirical data is intricately related to the notion of “statistical significance”

  • The articles contained a total of 1504 display items, distributed fairly symmetrically across journals and years, ranging from 204 (Nature 2017) to 284 (PNAS 2017). 110 articles (27% of all articles) included 287 display items containing P values (19% of all displays)

  • Our cross-sectional evaluation of P values reported in display items of three top science journals revealed a surge in their use over the last 20 years

Read more

Summary

Introduction

The long-standing controversy over how best to make inferences from empirical data is intricately related to the notion of “statistical significance”. The most widespread markers of statistical significance are constituted by P values derived from null hypothesis significance testing. P values indicate “the probability that a chosen test statistic would have been at least as large as its observed values if every model assumption were true, including the test hypothesis.” [1] (p.339). Using the P = .05 cut-off for separating statistically significant from nonsignificant findings [2] has been widely adopted and embraced as a tool for deciding whether a research finding is “true, valid and worth acting on” [3]. One of the most widespread misunderstandings of P values is the notion they “measure the probability that the studied hypothesis is true” [4] (p.131)

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.