Abstract

KEY POINT: Performing multiple hypotheses tests in a study can markedly increase the chance of false-positive findings, and appropriate multiplicity adjustments are thus typically required.In this issue of Anesthesia & Analgesia, Dunn et al1 report results of a study on the relationship between parameters of mechanical ventilation and multiple outcomes after spine surgery. For the primary analysis, 144 statistical models were used. The authors used the Bonferroni technique to adjust for multiple testing and set the significance threshold at 0.05/144 = 0.0003.Figure.: Excerpt from the Dunn et al1 Statistical Analysis section. For the primary analysis, the authors tested for the associations between 8 ventilator parameters and 18 outcome variables, corresponding to a total of 144 models. In each model, 1 hypothesis of primary interest was tested—disregarding the additional independent variables in each model, which were included to control for confounding rather than to draw inferences. The authors used the Bonferroni correction to control the Family-Wise Error Rate, which is the probability of committing any (1 or more) type I error within a set or “family” of hypothesis tests.For any statistical hypothesis test, there is a chance of falsely rejecting the null hypothesis when it is in fact true.2,3 The probability of this type I error (α) is commonly set at .05. When multiple hypotheses are tested, each at a .05 significance level, the risk for a type I error increases. When performing 2 independent tests, the probability of incorrectly rejecting at least 1 null hypothesis approaches 10%, when both null hypotheses are actually true; it increases up to 40% at 10 tests, and it can be >99% for 144 tests. Multiple testing commonly arises in the following situations3–5: (1) several outcomes are being studied, (2) several groups are compared to each other, (3) the same outcome is repetitively studied over time, and (4) several subgroup analyses are performed. Adjustments for multiple testing must be considered when at least 1 statistically significant result from a set of tests would lead to rather strong inferences. For example, testing the effect of a drug on 3 different primary outcomes, and concluding that the drug works when it is superior to placebo on any 1 outcome, carries a substantial risk of being a false-positive result, unless an appropriate adjustment is made. In contrast, an adjustment is not needed when researchers only declare superiority of the drug if it is superior on all 3 tested outcomes. For explorative analyses, rigorous adjustments may not be required if the results are only hypothesis generating and do not lead to conclusive inferences. The simplest approach to control the probability of committing any type I error with multiple testing is the Bonferroni correction, in which the significance threshold is set at α divided by the number of hypothesis tests. Only test results with P values smaller than the adjusted significance criterion are considered statistically significant. Alternatively, the P value of a test can be adjusted by multiplying it by the total number of tests, and the adjusted P value is then compared against α. The Bonferroni technique is quite conservative, and several alternative approaches are available: (1) to adjust the significance criterion (eg, Šidák correction); (2) to adjust in specific situations (eg, Tukey test for multiple pairwise comparisons); or (3) to control the expected proportion of false positives, rather than controlling the probability of any false-positive result (eg, Benjamini–Hochberg procedure).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call