Automated Paper Screening for Clinical Reviews Using Large Language Models: Data Analysis Study.

Eddie Guo,Jiawen Deng,Ye-Jean Park,Michael Paget,Mehul Gupta,Christopher Naugler

doi:10.2196/48996

Eddie Guo, Jiawen Deng + Show 4 more

Open Access

https://doi.org/10.2196/48996

Copy DOI

Journal: Journal of medical Internet research	Publication Date: Jan 12, 2024
Citations: 24	License type: cc-by

Affiliation: University of Calgary

Abstract

The systematic review of clinical research papers is a labor-intensive and time-consuming process that often involves the screening of thousands of titles and abstracts. The accuracy and efficiency of this process are critical for the quality of the review and subsequent health care decisions. Traditional methods rely heavily on human reviewers, often requiring a significant investment of time and resources. This study aims to assess the performance of the OpenAI generative pretrained transformer (GPT) and GPT-4 application programming interfaces (APIs) in accurately and efficiently identifying relevant titles and abstracts from real-world clinical review data sets and comparing their performance against ground truth labeling by 2 independent human reviewers. We introduce a novel workflow using the Chat GPT and GPT-4 APIs for screening titles and abstracts in clinical reviews. A Python script was created to make calls to the API with the screening criteria in natural language and a corpus of title and abstract data sets filtered by a minimum of 2 human reviewers. We compared the performance of our model against human-reviewed papers across 6 review papers, screening over 24,000 titles and abstracts. Our results show an accuracy of 0.91, a macro F1-score of 0.60, a sensitivity of excluded papers of 0.91, and a sensitivity of included papers of 0.76. The interrater variability between 2 independent human screeners was κ=0.46, and the prevalence and bias-adjusted κ between our proposed methods and the consensus-based human decisions was κ=0.96. On a randomly selected subset of papers, the GPT models demonstrated the ability to provide reasoning for their decisions and corrected their initial decisions upon being asked to explain their reasoning for incorrect classifications. Large language models have the potential to streamline the clinical review process, save valuable time and effort for researchers, and contribute to the overall quality of clinical reviews. By prioritizing the workflow and acting as an aid rather than a replacement for researchers and reviewers, models such as GPT-4 can enhance efficiency and lead to more accurate and reliable conclusions in medical research.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automated Paper Screening for Clinical Reviews Using Large Language Models: Data Analysis Study.

Abstract

Talk to us

Similar Papers

More From: Journal of medical Internet research

Lead the way for us

Similar Papers

E-185 Customized generative pretrained transformer for simplified patient education of carotid angioplasty and stenting: a feasibility study
A Brake ... E Samaniego
Journal of NeuroInterventional Surgery | VOL. 16
A Brake, et. al.A Brake ... E Samaniego
01 Jul 2024
Journal of NeuroInterventional Surgery | VOL. 16

A guideline-informed language model for paediatric cardiology demonstrates high performance in answering complex medical questions
T Uden ... P Beerbaum
European Heart Journal | VOL. 45
T Uden, et. al.T Uden ... P Beerbaum
28 Oct 2024
European Heart Journal | VOL. 45

How Can IJDS Authors, Reviewers, and Editors Use (and Misuse) Generative AI?
Galit Shmueli ... Bianca Maria Colosimo
INFORMS Journal on Data Science | VOL. 2
Galit Shmueli, et. al.Galit Shmueli ... Bianca Maria Colosimo
01 Apr 2023
INFORMS Journal on Data Science | VOL. 2

Evaluating the Performance of Large Language Models in Hematopoietic Stem Cell Transplantation Decision Making
Ivan Civettini ... Paola Perfetti
Blood | VOL. 142
Ivan Civettini, et. al.Ivan Civettini ... Paola Perfetti
02 Nov 2023
Blood | VOL. 142

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automated Paper Screening for Clinical Reviews Using Large Language Models: Data Analysis Study.

Abstract

Talk to us

Similar Papers

More From: Journal of medical Internet research