Can ChatGPT evaluate research quality?

Mike Thelwall

doi:10.2478/jdis-2024-0013

Abstract

Abstract Purpose Assess whether ChatGPT 4.0 is accurate enough to perform research evaluations on journal articles to automate this time-consuming task. Design/methodology/approach Test the extent to which ChatGPT-4 can assess the quality of journal articles using a case study of the published scoring guidelines of the UK Research Excellence Framework (REF) 2021 to create a research evaluation ChatGPT. This was applied to 51 of my own articles and compared against my own quality judgements. Findings ChatGPT-4 can produce plausible document summaries and quality evaluation rationales that match the REF criteria. Its overall scores have weak correlations with my self-evaluation scores of the same documents (averaging r=0.281 over 15 iterations, with 8 being statistically significantly different from 0). In contrast, the average scores from the 15 iterations produced a statistically significant positive correlation of 0.509. Thus, averaging scores from multiple ChatGPT-4 rounds seems more effective than individual scores. The positive correlation may be due to ChatGPT being able to extract the author’s significance, rigour, and originality claims from inside each paper. If my weakest articles are removed, then the correlation with average scores (r=0.200) falls below statistical significance, suggesting that ChatGPT struggles to make fine-grained evaluations. Research limitations The data is self-evaluations of a convenience sample of articles from one academic in one field. Practical implications Overall, ChatGPT does not yet seem to be accurate enough to be trusted for any formal or informal research quality evaluation tasks. Research evaluators, including journal editors, should therefore take steps to control its use. Originality/value This is the first published attempt at post-publication expert review accuracy testing for ChatGPT.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Data and Information Science	Publication Date: Apr 30, 2024
Citations: 4	License type: CC BY 4.0

R Discovery Prime

Can ChatGPT evaluate research quality?

Abstract

Published Version

Talk to us

Similar Papers

More From: Journal of Data and Information Science

Lead the way for us

Similar Papers

Has the Research Excellence Framework killed creativity?
R Gray
Journal of psychiatric and mental health nursing | VOL. 22
R GrayR Gray
23 Mar 2015
Journal of psychiatric and mental health nursing | VOL. 22

Abolishing the Higher Education Research Excellence Framework (the REF)
J R Shackleton ... Philip Booth
SSRN Electronic Journal | VOL. -
J R Shackleton, et. al.J R Shackleton ... Philip Booth
01 Jan 2015
SSRN Electronic Journal | VOL. -

What to do about research assessment (the REF)? A proposal for two-stage university education
David Colquhoun
The Winnower | VOL. -
David ColquhounDavid Colquhoun
01 Jan 2015
The Winnower | VOL. -

Systematic analysis of agreement between metrics and peer review in the UK REF
V A Traag ... L Waltman
Palgrave Communications | VOL. 5
V A Traag, et. al.V A Traag ... L Waltman
12 Mar 2019
Palgrave Communications | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Can ChatGPT evaluate research quality?

Abstract

Published Version

Talk to us

Similar Papers

More From: Journal of Data and Information Science