Peer review analyze: A novel benchmark resource for computational analysis of peer reviews.

Tirthankar Ghosal,Sandeep Kumar,Asif Ekbal,Prabhat Kumar Bharti,Lianmeng Jiao

doi:10.1371/journal.pone.0259238

Tirthankar Ghosal, Sandeep Kumar + Show 3 more

Open Access

https://doi.org/10.1371/journal.pone.0259238

Copy DOI

Journal: PloS one	Publication Date: Jan 27, 2022
Citations: 19	License type: CC BY 4.0

Affiliation: Charles University, Indian Institute of Technology Patna

Abstract

Peer Review is at the heart of scholarly communications and the cornerstone of scientific publishing. However, academia often criticizes the peer review system as non-transparent, biased, arbitrary, a flawed process at the heart of science, leading to researchers arguing with its reliability and quality. These problems could also be due to the lack of studies with the peer-review texts for various proprietary and confidentiality clauses. Peer review texts could serve as a rich source of Natural Language Processing (NLP) research on understanding the scholarly communication landscape, and thereby build systems towards mitigating those pertinent problems. In this work, we present a first of its kind multi-layered dataset of 1199 open peer review texts manually annotated at the sentence level (∼ 17k sentences) across the four layers, viz. Paper Section Correspondence, Paper Aspect Category, Review Functionality, and Review Significance. Given a text written by the reviewer, we annotate: to which sections (e.g., Methodology, Experiments, etc.), what aspects (e.g., Originality/Novelty, Empirical/Theoretical Soundness, etc.) of the paper does the review text correspond to, what is the role played by the review text (e.g., appreciation, criticism, summary, etc.), and the importance of the review statement (major, minor, general) within the review. We also annotate the sentiment of the reviewer (positive, negative, neutral) for the first two layers to judge the reviewer's perspective on the different sections and aspects of the paper. We further introduce four novel tasks with this dataset, which could serve as an indicator of the exhaustiveness of a peer review and can be a step towards the automatic judgment of review quality. We also present baseline experiments and results for the different tasks for further investigations. We believe our dataset would provide a benchmark experimental testbed for automated systems to leverage on current NLP state-of-the-art techniques to address different issues with peer review quality, thereby ushering increased transparency and trust on the holy grail of scientific research validation. Our dataset and associated codes are available at https://www.iitp.ac.in/~ai-nlp-ml/resources.html#Peer-Review-Analyze.

Highlights

The peer-review process is the only widely accepted method of research validation
The most significant aspect: Empirical and Theoretical Soundness is highly judged against the Methodology section, followed by Experiments, Results, Related Work, Problem Definition, Data, and Analysis
Estimating Peer Review Quality is a crucial problem for the health of science and to add force to the gatekeeper of scientific knowledge and wisdom

Summary

Introduction

The peer-review process is the only widely accepted method of research validation. The meta-research and science of science [11] communities have long invested in studying annals of the peer-review process [12,13,14,15,16,17]. Alarmed by some of these glaring problems at the core of the review process [18], coupled with the exponential increase in research paper submissions, the larger research community ( just Meta Science) felt the need to study the paper-vetting system and build proposals towards mitigating its problems [19]. Ensuring Quality of Peer Reviews is a time-critical problem

Methods

Results

Conclusion