The reproducibility of research and the misinterpretation of p-values.

David Colquhoun

doi:10.1098/rsos.171085

Abstract

We wish to answer this question: If you observe a ‘significant’ p-value after doing a single unbiased experiment, what is the probability that your result is a false positive? The weak evidence provided by p-values between 0.01 and 0.05 is explored by exact calculations of false positive risks. When you observe p = 0.05, the odds in favour of there being a real effect (given by the likelihood ratio) are about 3 : 1. This is far weaker evidence than the odds of 19 to 1 that might, wrongly, be inferred from the p-value. And if you want to limit the false positive risk to 5%, you would have to assume that you were 87% sure that there was a real effect before the experiment was done. If you observe p = 0.001 in a well-powered experiment, it gives a likelihood ratio of almost 100 : 1 odds on there being a real effect. That would usually be regarded as conclusive. But the false positive risk would still be 8% if the prior probability of a real effect were only 0.1. And, in this case, if you wanted to achieve a false positive risk of 5% you would need to observe p = 0.00045. It is recommended that the terms ‘significant’ and ‘non-significant’ should never be used. Rather, p-values should be supplemented by specifying the prior probability that would be needed to produce a specified (e.g. 5%) false positive risk. It may also be helpful to specify the minimum false positive risk associated with the observed p-value. Despite decades of warnings, many areas of science still insist on labelling a result of p < 0.05 as ‘statistically significant’. This practice must contribute to the lack of reproducibility in some areas of science. This is before you get to the many other well-known problems, like multiple comparisons, lack of randomization and p-hacking. Precise inductive inference is impossible and replication is the only way to be sure. Science is endangered by statistical misunderstanding, and by senior people who impose perverse incentives on scientists.

Highlights

The major point of this paper is that the test of significance does not provide the information concerning psychological phenomena characteristically attributed to it; and that, a great deal of mischief has been associated with its use
The fact that we hardly ever have a valid value for the prior probability means that it is impossible to calculate the false positive risk
It is often said that p-values exaggerate the evidence against the null hypothesis, this is not strictly true

Summary

Introduction

The major point of this paper is that the test of significance does not provide the information concerning psychological phenomena characteristically attributed to it; and that, a great deal of mischief has been associated with its use. What you want to know is when a statistical test of significance comes out positive, what the probability is that you have a false positive, i.e. there is no real effect and the results have occurred by chance. This probability is defined here as the false positive risk (FPR). It is assumed throughout this paper that we wish to test a precise hypothesis, e.g. that the effect size is zero (though it makes little difference if we allow a narrow band around zero [3,4]) The reasonableness of this approach is justified in appendix A1. Before getting to results it will be helpful to clarify the ideas that will be used

Definition of terms

Which interpretation is better: ‘p-less-than’ or ‘p-equals’?

Simulation versus exact calculation

Likelihood ratios

Observed likelihood ratios

False positive risk as function of sample size

The reverse Bayesian argument

Discussion

Conclusion and what should be done?

The point null hypothesis

Calculation of likelihood ratios and false positive risks

The calculation of the prior probability: the reverse Bayesian approach

How to do the calculations

Findings

Bayesian estimation in single molecule kinetics

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Royal Society Open Science	Publication Date: Dec 1, 2017
Citations: 182	License type: cc-by

R Discovery Prime

R Discovery Prime

The reproducibility of research and the misinterpretation of p-values.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Royal Society Open Science

Lead the way for us

Similar Papers

Supplementary material from "The reproducibility of research and the misinterpretation of p-values"

-

01 Jan 2017
01 Jan 2017

1 False positive risk: a solution to the problem of P-values
David Colquhoun
-
David ColquhounDavid Colquhoun
01 Jun 2018
01 Jun 2018

Getting it right matters! Covid-19 pandemic analogies to everyday life in medical sciences.
Tomas L Bothe ... Niklas Pilz
Acta Physiologica | VOL. 233
Tomas L Bothe, et. al.Tomas L Bothe ... Niklas Pilz
14 Jul 2021
Acta Physiologica | VOL. 233

Statistically significant differences versus convincing evidence of real treatment effects: an analysis of the false positive risk for single-centre trials in anaesthesia
David Sidebotham ... Philip M Jones
British Journal of Anaesthesia | VOL. 132
David Sidebotham, et. al.David Sidebotham ... Philip M Jones
28 Nov 2023
British Journal of Anaesthesia | VOL. 132

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The reproducibility of research and the misinterpretation of p-values.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Royal Society Open Science