Abstract

Appropriate descriptions of statistical methods are essential for evaluating research quality and reproducibility. Despite continued efforts to improve reporting in publications, inadequate descriptions of statistical methods persist. At times, reading statistical methods sections can conjure feelings of dèjá vu, with content resembling cut-and-pasted or "boilerplate text" from already published work. Instances of boilerplate text suggest a mechanistic approach to statistical analysis, where the same default methods are being used and described using standardized text. To investigate the extent of this practice, we analyzed text extracted from published statistical methods sections from PLOS ONE and the Australian and New Zealand Clinical Trials Registry (ANZCTR). Topic modeling was applied to analyze data from 111,731 papers published in PLOS ONE and 9,523 studies registered with the ANZCTR. PLOS ONE topics emphasized definitions of statistical significance, software and descriptive statistics. One in three PLOS ONE papers contained at least 1 sentence that was a direct copy from another paper. 12,675 papers (11%) closely matched to the sentence "a p-value < 0.05 was considered statistically significant". Common topics across ANZCTR studies differentiated between study designs and analysis methods, with matching text found in approximately 3% of sections. Our findings quantify a serious problem affecting the reporting of statistical methods and shed light on perceptions about the communication of statistics as part of the scientific process. Results further emphasize the importance of rigorous statistical review to ensure that adequate descriptions of methods are prioritized over relatively minor details such as p-values and software when reporting research outcomes.

Highlights

  • An ideal statistical analysis uses appropriate methods to draw insights from data and inform the research questions

  • We used two openly available data sources to find statistical methods sections: research articles published in PLOS ONE and study protocols registered on the Australian and New Zealand Clinical Trials Registry (ANZCTR)

  • Statistical methods sections were missing for some studies downloaded from ANZCTR, including sections labelled as “Not applicable”, “Nil” or “None”. Since these studies would be excluded from topic modeling, we examined if there were particular studies where the statistical methods section was more likely to be missing

Read more

Summary

Methods

We used two openly available data sources to find statistical methods sections: research articles published in PLOS ONE and study protocols registered on the Australian and New Zealand Clinical Trials Registry (ANZCTR). Reviewers are required to assess submissions against several publication criteria, including whether: “Experiments, statistics, and other analyses are performed to a high technical standard and are described in sufficient detail” [20]. Authors are encouraged to follow published reporting guidelines such as EQUATOR, to ensure that chosen statistical methods are appropriate for the study design, and adequate details are provided to enable independent replication of results. Text cleaning aimed to standardize notation and statistical terminology, whilst minimizing changes to article style and formatting. We retained selected stop words that, if excluded, may have changed the context of statistical methods being described, for example ‘between’ and ‘against’

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call