Peer review is a decisive factor in selecting research grant proposals for funding. The usefulness of peer review depends in part on the agreement of multiple reviewers' judgments of the same proposal, and on each reviewer's consistency in judging proposals. Peer reviewers are also instructed to disregard characteristics that are not among the evaluation criteria. However, for example, the gender identity-of the investigator or reviewer-may be associated with differing evaluations. This experiment sought to characterize the psychometric properties of peer review among 605 experienced peer reviewers and to examine possible differences in peer review judgments based on peer reviewer and investigator gender. Participants evaluated National Institutes of Health-style primary reviewers' overall impact statements that summarized the study's purpose, its overall evaluation, and its strengths and weaknesses in five criterion areas: significance, approach, investigator, innovation, and environment. Evaluations were generally consistent between reviewers and within reviewers over a two-week period. However, there was less consistency in judging proposals with weaknesses. Regarding gender differences, women reviewers tended to provide more positive evaluations, and women investigators received better overall evaluations. Unsuccessful grant applicants use reviewer feedback to improve their proposals, which could be made more challenging with inconsistent reviews. Peer reviewer training and calibration could increase reviewer consistency, which is especially relevant for proposals with weaknesses according to this study's results. Evidence of systematic differences in proposal scores based on investigator and reviewer gender may also indicate the usefulness of calibration and training. For example, peer reviewers could score practice proposals and discuss differences prior to independently scoring assigned proposals.
Read full abstract