Abstract

Common reporting styles for statistical results in scientific articles, such as p-values and confidence intervals (CI), have been reported to be prone to dichotomous interpretations, especially with respect to the null hypothesis significance testing framework. For example when the p-value is small enough or the CIs of the mean effects of a studied drug and a placebo are not overlapping, scientists tend to claim significant differences while often disregarding the magnitudes and absolute differences in the effect sizes. This type of reasoning has been shown to be potentially harmful to science. Techniques relying on the visual estimation of the strength of evidence have been recommended to reduce such dichotomous interpretations but their effectiveness has also been challenged. We ran two experiments on researchers with expertise in statistical analysis to compare several alternative representations of confidence intervals and used Bayesian multilevel models to estimate the effects of the representation styles on differences in researchers' subjective confidence in the results. We also asked the respondents' opinions and preferences in representation styles. Our results suggest that adding visual information to classic CI representation can decrease the tendency towards dichotomous interpretations - measured as the 'cliff effect': the sudden drop in confidence around p-value 0.05 - compared with classic CI visualization and textual representation of the CI with p-values. All data and analyses are publicly available at https://github.com/helske/statvis.

Highlights

  • ONE of the most common research questions in many scientific fields is “Does X have an effect on Y?”, where, for example, X is a new drug, and Y a disease

  • Our results suggest that despite the increased debate around null hypothesis significance testing (NHST) and related concepts, the problem of dichotomous thinking persists in the scientific community, but that certain visualization styles can help to reduce the cliff effect and should be used and studied further

  • It is important to note that, given a single sample and the corresponding confidence intervals (CI), we cannot infer whether the true population mean, m, is contained within the CI or not [18] it has a direct connection to NHST in that the 95 percent CI represents the range of values of m for which the difference between m and x is not statistically significant at the 5 percent level

Read more

Summary

Introduction

ONE of the most common research questions in many scientific fields is “Does X have an effect on Y?”, where, for example, X is a new drug, and Y a disease. It is important to note that, given a single sample and the corresponding CI, we cannot infer whether the true population mean, m, is contained within the CI or not [18] it has a direct connection to NHST in that the 95 percent CI represents the range of values of m for which the difference between m and x is not statistically significant at the 5 percent level. We obtained p-value of 0.058 many researchers, despite the small difference, would follow the recommendations of colleagues and textbooks, and consider this as not enough evidence against H0 [19] This type of reasoning, often called dichotomous thinking or dichomotous inference has been shown to be potentially harmful to science [2], [20], [21], [22], [23]. While dichotomous thinking has been heavily criticized by scholars

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call