A Reliability-Generalization Study of Journal Peer Reviews: A Multilevel Meta-Analysis of Inter-Rater Reliability and Its Determinants

Lutz Bornmann,Hans-Dieter Daniel,Rüdiger Mutz,Simon Rogers

doi:10.1371/journal.pone.0014331

Lutz Bornmann, Hans-Dieter Daniel + Show 2 more

Open Access

https://doi.org/10.1371/journal.pone.0014331

Copy DOI

Abstract

BackgroundThis paper presents the first meta-analysis for the inter-rater reliability (IRR) of journal peer reviews. IRR is defined as the extent to which two or more independent reviews of the same scientific document agree.Methodology/Principal FindingsAltogether, 70 reliability coefficients (Cohen's Kappa, intra-class correlation [ICC], and Pearson product-moment correlation [r]) from 48 studies were taken into account in the meta-analysis. The studies were based on a total of 19,443 manuscripts; on average, each study had a sample size of 311 manuscripts (minimum: 28, maximum: 1983). The results of the meta-analysis confirmed the findings of the narrative literature reviews published to date: The level of IRR (mean ICC/r2 = .34, mean Cohen's Kappa = .17) was low. To explain the study-to-study variation of the IRR coefficients, meta-regression analyses were calculated using seven covariates. Two covariates that emerged in the meta-regression analyses as statistically significant to gain an approximate homogeneity of the intra-class correlations indicated that, firstly, the more manuscripts that a study is based on, the smaller the reported IRR coefficients are. Secondly, if the information of the rating system for reviewers was reported in a study, then this was associated with a smaller IRR coefficient than if the information was not conveyed.Conclusions/SignificanceStudies that report a high level of IRR are to be considered less credible than those with a low level of IRR. According to our meta-analysis the IRR of peer assessments is quite limited and needs improvement (e.g., reader system).

Highlights

Science rests on journal peer review [1]
In this study we present the first meta-analysis for reliability of journal peer reviews
The results of our analyses confirmed the findings of narrative reviews: a low level of inter-rater reliability (IRR): .34 for intraclass correlation (ICC) and r and .17 for Cohen’s Kappa

Summary

Introduction

Science rests on journal peer review [1]. As stated in a British Academy report, ‘‘the essential principle of peer review is simple to state: it is that judgements about the worth or value of a piece of research should be made by those with demonstrated competence to make such a judgement. According to Marsh, Bond, and Jayasinghe [4], the most important weakness of the peer review process is that the ratings given to the same submission by different reviewers typically differ. This results in a lack of inter-rater reliability (IRR). All overviews of the literature on the reliability of peer reviews published so far come to the same conclusion: There is a low level of IRR [5,6,7,8] These reviews describe the existing literature using the narrative technique, without attempting any quantitative synthesis of study results. What are the determinants of a high or low level of IRR [7]?

Materials and Methods

Results

Method

Discussion