Abstract

P‐values are often calculated when testing hypotheses in quantitative settings, and low P‐values are typically used as evidential measures to support research findings in published medical research. This article reviews old and new arguments questioning the evidential value of P‐values. Critiques of the P‐value include that it is confounded, fickle, and overestimates the evidence against the null. P‐values may turn out falsely low in studies due to random or systematic errors. Even correctly low P‐values do not logically provide support to any hypothesis. Recent studies show low replication rates of significant findings, questioning the dependability of published low P‐values. P‐values are poor indicators in support of scientific propositions. P‐values must be inferred by a thorough understanding of the study's question, design, and conduct. Null hypothesis significance testing will likely remain an important method in quantitative analysis but may be complemented with other statistical techniques that more straightforwardly address the size and precision of an effect or the plausibility that a hypothesis is true.

Highlights

  • The P-value is the most well-known statistic, typically accompanying some measures of effect or association in scientific publications reporting the results of quantitative analyses

  • The P-value is the result of a significance test; a test often credited to the statistical pioneer Ronald Fisher who published some seminal books and papers between 1920 and 1960 on the development of statistical methods

  • Even though P-values are not necessary for hypothesis testing strictly speaking, Fisher's test of significance and Neyman Pearson's rule of behavior were inevitably combined in the procedure known as null hypothesis significance testing

Read more

Summary

INTRODUCTION

The P-value is the most well-known statistic, typically accompanying some measures of effect or association in scientific publications reporting the results of quantitative analyses. The P-value is the result of a significance test; a test often credited to the statistical pioneer Ronald Fisher who published some seminal books and papers between 1920 and 1960 on the development of statistical methods. Fisher regarded the P-value as an informal but objective index of evidence against the null hypothesis, to be used by the researcher to judge whether data is compatible or not with the null. Some years after Fisher introduced the significance test, two other statisticians, Jerzy Neyman and Egon Pearson, developed the theory of hypothesis testing, a test in which they let data determine if the null should be rejected or not in favor of an alternative hypothesis. Neyman and Pearson dismissed Fisher's evidential interpretation of P-values and were more concerned about controlling long-term error rates when performing hypothesis testing by reflected use of rejection levels and study power

Evidence and decisions
Virtues and weaknesses of P
Not supportive
Poor replications of P
Reasons for falsely low P
Findings
So where does this leave P?
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call