Inappropriate use of statistical power

Raphael A. Fraser

doi:10.1038/s41409-023-01935-3

Abstract

We are pleased to add this typescript, Inappropriate use of statistical power by Raphael Fraser to the BONE MARROW TRANSPLANTATION Statistics Series. The authour discusses how we sometimes misuse statistical analyses after a study is completed and analyzed to explain the results. The most egregious example is post hoc power calculations.When the conclusion of an observational study or clinical trial is negative, namely, the data observed (or more extreme data) fail to reject the null hypothesis, people often argue for calculating the observed statistical power. This is especially true of clinical trialists believing in a new therapy who wished and hoped for a favorable outcome (rejecting the null hypothesis). One is reminded of the saying from Benjamin Franklin: A man convinced against his will is of the same opinion still.As the authour notes, when we face a negative conclusion of a clinical trial there are two possibilities: (1) there is no treatment effect; or (2) we made a mistake. By calculating the observed power after the study, people (incorrectly) believe if the observed power is high there is strong support for the null hypothesis. However, the problem is usually the opposite: if the observed power is low, the null hypothesis was not rejected because there were too few subjects. This is usually couched in terms such as: there was a trend towards… or we failed to detect a benefit because we had too few subjects or the like. Observed power should not be used to interpret results of a negative study. Put more strongly, observed power should not be calculated after a study is completed and analyzed. The power of the study to reject or not the null hypothesis is already incorporated in the calculation of the p value.The authour use interesting analogies to make important points about hypothesis testing. Testing the null hypothesis is like a jury trial. The jury can find the plaintiff guilty or not guilty. They cannot find him innocent. It is always important to recall failure to reject the null hypothesis does not mean the null hypothesis is true, simply there are insufficient evidence (data) to reject it. As the author notes: In a sense, hypothesis testing is like world championship boxing where the null hypothesis is the champion until defeated by the challenger, the alternative hypothesis, to become the new world champion.The authour include a discussion of what is a p-value, a topic we discussed before in this series and elsewhere [1, 2]. Finally, there is a nice discussion of confidence intervals (frequentist) and credibility limits (Bayesian). A frequentist interpretation views probability as the limit of the relative frequency of an event after many trials. In contrast, a Bayesian interpretation views probability in the context of a degree of belief in an event . This belief could be based on prior knowledge such as the results of previous trials, biological plausibility or personal beliefs (my drug is better than your drug). The important point is the common mis-interpretation of confidence intervals. For example, many researchers interpret a 95 percent confidence interval to mean there is a 95 percent chance this interval contains the parameter value. This is wrong. It means, if we repeat the identical study many times 95 percent of the intervals will contain the true but unknown parameter in the population. This will seem strange to many people because we are interested only in the study we are analyzing, not in repeating the same study-design many times.We hope readers will enjoy this well-written summary of common statistical errors, especially post hoc calculations of observed power. Going forth we hope to ban statements like there was a trend towards… or we failed to detect a benefit because we had too few subjects from the Journal. Reviewers have been advised. Proceed at your own risk. Robert Peter Gale MD, PhD, DSc(hc), FACP, FRCP, FRCPI(hon), FRSM, Imperial College London, Mei-Jie Zhang PhD, Medical College of Wisconsin.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Inappropriate use of statistical power

Abstract

Talk to us

Similar Papers

More From: Bone Marrow Transplantation

Lead the way for us

Journal: Bone Marrow Transplantation	Publication Date: Mar 3, 2023
Citations: 4

Similar Papers

Hypothesis Testing
Shane Allua ... Cheryl Bagley Thompson
Air Medical Journal | VOL. 28
Shane Allua, et. al.Shane Allua ... Cheryl Bagley Thompson
01 May 2009
Air Medical Journal | VOL. 28

Hypothesis tests
J Walker
BJA Education | VOL. 19
J WalkerJ Walker
14 May 2019
BJA Education | VOL. 19

Testing the Wrong Hypothesis in Phase II Oncology Trials: There Is a Better Alternative
Mark J Ratain ... Theodore G Karrison
Clinical Cancer Research | VOL. 13
Mark J Ratain, et. al.Mark J Ratain ... Theodore G Karrison
01 Feb 2007
Clinical Cancer Research | VOL. 13

The Art of the Null Hypothesis—Considerations for Study Design and Scientific Reporting
Christian T O'Donnell ... Matthew W Vanneman
Journal of Cardiothoracic and Vascular Anesthesia | VOL. 37
Christian T O'Donnell, et. al.Christian T O'Donnell ... Matthew W Vanneman
22 Feb 2023
Journal of Cardiothoracic and Vascular Anesthesia | VOL. 37

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Inappropriate use of statistical power

Abstract

Talk to us

Similar Papers

More From: Bone Marrow Transplantation