Abstract
Peer review is used commonly across science as a tool to evaluate the merit and potential impact of research projects and make funding recommendations. While potential impact is likely difficult to assess ex-ante, there have been relatively few attempts made to get a sense of the predictive accuracy of review decisions using impact measures of the results of the completed projects. Although many outputs, and thus potential measures of impact, exist for research projects, the overwhelming majority of evaluation of research output is focused on bibliometrics. We review the multiple types of potential impact measures with an interest in their application to validate review decisions. A review of the current literature on validating peer review decisions with research output impact measures is presented here; only 48 studies were identified, about half of which were US based and sample size per study varied greatly. 69% of the studies employed bibliometrics as a research output. While 52% of the studies employed alternative measures (like patents and technology licensing, post-project peer review, international collaboration, future funding success, securing tenure track positions, and career satisfaction), only 25% of all projects used more than one measure of research output. Overall, 91% of studies with unfunded controls and 71% of studies without such controls provided evidence for at least some level of predictive validity of review decisions. However, several studies reported observing sizable type I and II errors as well. Moreover, many of the observed effects were small and several studies suggest a coarse discriminatory power, able to separate poor proposals from better ones, but no discrimination amongst the top tier proposals or applicants (although discriminatory ability depended on the impact metric). This is of particular concern in an era of low funding success rates. More research is needed, particularly in integrating multiple types of impact indicators in these validity tests, as well as considering the context of the research outputs relative to goals of the research program and concerns for reproducibility, translatability and publication bias. In parallel, more research is needed focusing on the internal validity of review decision making procedures and reviewer bias.
Highlights
Funded outperformed unfunded applicants in faculty position, R01 grant success, and publications, but differences diminished when controlled for review score
From the results summarized in this review, it seems that peer review likely does have some coarse discrimination in determining the level and quality of output from research funding, suggesting the system does have some level of validity, admittedly the span of funding agencies and mechanisms included in this review complicates generalization somewhat
While it may be able to separate good and flawed proposals, discrimination amongst the top tier proposals or applicants may be more difficult, which is what the system is currently charged to do given recent funding levels (Fang et al, 2016). This seems to depend on the metric used, as some studies found a high degree of discrimination when tracking career success of funded and top tier unfunded applicants (Fang and Meyer, 2003; Hornbostel et al, 2009; Escobar-Alvarez and Myers, 2013), the effects of funding itself have to be teased out (Bol et al, 2018)
Summary
Funded applicants higher citations than unfunded both ex-ante and ex-post ( there is significant type I/II error) Funded applicants had more research products and higher levels of collaboration than unfunded applicants Funded applicants higher ex post NIH funding success than unfunded yet highly ranked applicants (ex-ante)
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have