Artificial intelligence versus human touch: can artificial intelligence accurately generate a literature review on laser technologies?

Frédéric Panthier,Hugh Crawford-Smith,Eduarda Alvarez,Alberto Melchionna,Daniela Velinova,Ikran Mohamed,Siobhan Price,Simon Choong,Vimoshan Arumuham,Sian Allen,Olivier Traxer,Daron Smith

doi:10.1007/s00345-024-05311-8

Abstract

To compare the accuracy of open-source Artificial Intelligence (AI) Large Language Models (LLM) against human authors to generate a systematic review (SR) on the new pulsed-Thulium:YAG (p-Tm:YAG) laser. Five manuscripts were compared. The Human-SR on p-Tm:YAG (considered to be the "ground truth") was written by independent certified endourologists with expertise in lasers, accepted in a peer-review pubmed-indexed journal (but not yet available online, and therefore not accessible to the LLMs). The query to the AI LLMs was: "write a systematic review on pulsed-Thulium:YAG laser for lithotripsy" which was submitted to four LLMs (ChatGPT3.5/Vercel/Claude/Mistral-7b). The LLM-SR were uniformed and Human-SR reformatted to fit the general output appearance, to ensure blindness. Nine participants with various levels of endourological expertise (three Clinical Nurse Specialist nurses, Urology Trainees and Consultants) objectively assessed the accuracy of the five SRs using a bespoke 10 "checkpoint" proforma. A subjective assessment was recorded using a composite score including quality (0-10), clarity (0-10) and overall manuscript rank (1-5). The Human-SR was objectively and subjectively more accurate than LLM-SRs (96 ± 7% and 86.8 ± 8.2% respectively; p < 0.001). The LLM-SRs did not significantly differ but ChatGPT3.5 presented greater subjective and objective accuracy scores (62.4 ± 15% and 29 ± 28% respectively; p > 0.05). Quality and clarity assessments were significantly impacted by SR type but not the expertise level (p < 0.001 and > 0.05, respectively). LLM generated data on highly technical topics present a lower accuracy than Key Opinion Leaders. LLMs, especially ChatGPT3.5, with human supervision could improve our practice.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Artificial intelligence versus human touch: can artificial intelligence accurately generate a literature review on laser technologies?

Abstract

Talk to us

Similar Papers

More From: World journal of urology

Lead the way for us

Similar Papers

How Can IJDS Authors, Reviewers, and Editors Use (and Misuse) Generative AI?
Galit Shmueli ... Bianca Maria Colosimo
INFORMS Journal on Data Science | VOL. 2
Galit Shmueli, et. al.Galit Shmueli ... Bianca Maria Colosimo
01 Apr 2023
INFORMS Journal on Data Science | VOL. 2

The rise of artificial intelligence: addressing the impact of large language models such as ChatGPT on scientific publications.
Tiing Leong Ang ... Kay Choong See
Singapore Medical Journal | VOL. 64
Tiing Leong Ang, et. al.Tiing Leong Ang ... Kay Choong See
30 Mar 2023
Singapore Medical Journal | VOL. 64

Getting AI Right: Introductory Notes on AI & Society
James Manyika
Daedalus | VOL. 151
James ManyikaJames Manyika
01 May 2022
Daedalus | VOL. 151

Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT.
Zachary C Lum
Clinical Orthopaedics & Related Research | VOL. 481
Zachary C LumZachary C Lum
23 May 2023
Clinical Orthopaedics & Related Research | VOL. 481

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Artificial intelligence versus human touch: can artificial intelligence accurately generate a literature review on laser technologies?

Abstract

Talk to us

Similar Papers

More From: World journal of urology