Graphical integrity issues in open access publications: Detection and patterns of proportional ink violations.

Han Zhuang,Daniel E Acuna,Tzu-Yang Huang,Feilim Mac Gabhann

doi:10.1371/journal.pcbi.1009650

Han Zhuang, Daniel E Acuna + Show 2 more

Open Access

https://doi.org/10.1371/journal.pcbi.1009650

Copy DOI

Journal: PLoS computational biology	Publication Date: Dec 13, 2021
Citations: 6	License type: CC BY 4.0

Affiliation: Syracuse University

Abstract

Academic graphs are essential for communicating complex scientific ideas and results. To ensure that these graphs truthfully reflect underlying data and relationships, visualization researchers have proposed several principles to guide the graph creation process. However, the extent of violations of these principles in academic publications is unknown. In this work, we develop a deep learning-based method to accurately measure violations of the proportional ink principle (AUC = 0.917), which states that the size of shaded areas in graphs should be consistent with their corresponding quantities. We apply our method to analyze a large sample of bar charts contained in 300K figures from open access publications. Our results estimate that 5% of bar charts contain proportional ink violations. Further analysis reveals that these graphical integrity issues are significantly more prevalent in some research fields, such as psychology and computer science, and some regions of the globe. Additionally, we find no temporal and seniority trends in violations. Finally, apart from openly releasing our large annotated dataset and method, we discuss how computational research integrity could be part of peer-review and the publication processes.

Highlights

Scientists need to understand previous research so as to build on others’ work
We manually annotated a sample of bar charts from images of the large PubMed Central Open Access Subset (PMOAS) collection
This is because we want to examine why our human annotators find 5.5% of bar charts having proportional ink violation, but our deep learning-based method finds 2.3% of bar charts having proportional ink violation. We found this discrepancy between predicted prevalence and the human-annotated prevalence can be the result of the threshold in our classifier for classifying a bar chart into proportional ink violation

Summary

Introduction

Scientists need to understand previous research so as to build on others’ work. This goal is best achieved when research findings are conveyed accurately and transparently. While inaccuracies can be the result of honest mistakes, some are research integrity matters [1]. Some studies have estimated that 4% to 9% of scientists have been involved in research misconduct or have observed others’ research misconduct [2,3]. Assigning intentionality can be highly problematic, requiring expert opinion based on quantitative analysis. Regardless of intentionality, early, accurate, and scalable detection of potential problems during pre- and post-publication are crucial steps to make science more robust

Methods

Results

Discussion

Conclusion