Abstract Background The time-consuming screening phase in health-related evidence syntheses is increasingly supported by artificial intelligence (AI). However, scoping reviews have not benefited as much as systematic reviews from such AI tools as they utilize conceptual rather than keyword-specific search strategies to address broad research questions. Context-understanding chatbots based on large language models could potentially enhance the efficiency of scoping review screenings. This study evaluates the performance of ChatGPT against an open-access AI tool used for abstract screening in systematic reviews and the costs involved. Methods Leveraging data from a prior scoping review, we compared the performance of ChatGPT 4.0 and 3.5 against Rayyan, using human researchers’ decisions as a benchmark. A random set of 50 included and 50 excluded abstracts was used to train Rayyan’s algorithm and to develop ChatGPT’s prompt. ChatGPT 4.0’s evaluation was repeated after 5-7 days to assess response consistency. We computed performance metrics including sensitivity, specificity, and accuracy. Results ChatGPT 4.0 and 3.5 achieved 68% accuracy, 11% precision, 99% negative predictive value, and 67% specificity. Sensitivity was high at 88-89% for ChatGPT 4.0 and 84% for ChatGPT 3.5. ChatGPT 4.0 showed a substantial interrater reliability between the two evaluations and moderate reliability compared to ChatGPT 3.5. The cost of deployment varied, with Rayyan being free, ChatGPT 3.5 costing $9.06 and ChatGPT 4.0 $505.72. Conclusions Given the exponential increase in publications, effective mechanisms to support the screening phase of scoping reviews are needed. Our feasibility study using ChatGPT to decide on abstracts’ inclusion or exclusion achieved good performance metrics. Given further positive evaluation, such chatbots might be incorporated in the standard screening process, possibly replacing a second screener, saving time and costs, and accelerating evidence synthesis. Key messages • ChatGPT performs well in supporting screening for scoping reviews, outperforming a traditional AI tool at reasonable costs. • Employing chatbots like ChatGPT could potentially cut costs and time in scoping reviews.
Read full abstract