Leveraging Large Language Models for Decision Support in Personalized Oncology

Manuela Benary,Xing David Wang,Max Schmidt,Dominik Soll,Georg Hilfenhaus,Mani Nassir,Christian Sigler,Maren Knödler,Ulrich Keller,Dieter Beule,Ulrich Keilholz,Ulf Leser,Damian T Rieke

doi:10.1001/jamanetworkopen.2023.43689

Manuela Benary, Xing David Wang + Show 11 more

Open Access

https://doi.org/10.1001/jamanetworkopen.2023.43689

Copy DOI

Abstract

Clinical interpretation of complex biomarkers for precision oncology currently requires manual investigations of previous studies and databases. Conversational large language models (LLMs) might be beneficial as automated tools for assisting clinical decision-making. To assess performance and define their role using 4 recent LLMs as support tools for precision oncology. This diagnostic study examined 10 fictional cases of patients with advanced cancer with genetic alterations. Each case was submitted to 4 different LLMs (ChatGPT, Galactica, Perplexity, and BioMedLM) and 1 expert physician to identify personalized treatment options in 2023. Treatment options were masked and presented to a molecular tumor board (MTB), whose members rated the likelihood of a treatment option coming from an LLM on a scale from 0 to 10 (0, extremely unlikely; 10, extremely likely) and decided whether the treatment option was clinically useful. Number of treatment options, precision, recall, F1 score of LLMs compared with human experts, recognizability, and usefulness of recommendations. For 10 fictional cancer patients (4 with lung cancer, 6 with other; median [IQR] 3.5 [3.0-4.8] molecular alterations per patient), a median (IQR) number of 4.0 (4.0-4.0) compared with 3.0 (3.0-5.0), 7.5 (4.3-9.8), 11.5 (7.8-13.0), and 13.0 (11.3-21.5) treatment options each was identified by the human expert and 4 LLMs, respectively. When considering the expert as a criterion standard, LLM-proposed treatment options reached F1 scores of 0.04, 0.17, 0.14, and 0.19 across all patients combined. Combining treatment options from different LLMs allowed a precision of 0.29 and a recall of 0.29 for an F1 score of 0.29. LLM-generated treatment options were recognized as AI-generated with a median (IQR) 7.5 (5.3-9.0) points in contrast to 2.0 (1.0-3.0) points for manually annotated cases. A crucial reason for identifying AI-generated treatment options was insufficient accompanying evidence. For each patient, at least 1 LLM generated a treatment option that was considered helpful by MTB members. Two unique useful treatment options (including 1 unique treatment strategy) were identified only by LLM. In this diagnostic study, treatment options of LLMs in precision oncology did not reach the quality and credibility of human experts; however, they generated helpful ideas that might have complemented established procedures. Considering technological progress, LLMs could play an increasingly important role in assisting with screening and selecting relevant biomedical literature to support evidence-based, personalized treatment decisions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: JAMA network open	Publication Date: Nov 17, 2023
Citations: 47	License type: cc-by

R Discovery Prime

R Discovery Prime

Leveraging Large Language Models for Decision Support in Personalized Oncology

Abstract

Talk to us

Similar Papers

More From: JAMA network open

Lead the way for us

Similar Papers

Large language models for precision oncology: Clinical decision support through expert-guided learning.
Jacqueline Lammert ... Kristina Schwamborn
Journal of Clinical Oncology | VOL. 42
Jacqueline Lammert, et. al.Jacqueline Lammert ... Kristina Schwamborn
01 Jun 2024
Journal of Clinical Oncology | VOL. 42

Expert-Guided Large Language Models for Clinical Decision Support in Precision Oncology.
Jacqueline Lammert ... Maximilian Tschochohei
JCO precision oncology | VOL. 8
Jacqueline Lammert, et. al.Jacqueline Lammert ... Maximilian Tschochohei
01 Oct 2024
JCO precision oncology | VOL. 8

How Can IJDS Authors, Reviewers, and Editors Use (and Misuse) Generative AI?
Galit Shmueli ... Bianca Maria Colosimo
INFORMS Journal on Data Science | VOL. 2
Galit Shmueli, et. al.Galit Shmueli ... Bianca Maria Colosimo
01 Apr 2023
INFORMS Journal on Data Science | VOL. 2

Large Language Models Can Enable Inductive Thematic Analysis of a Social Media Corpus in a Single Prompt: Human Validation Study.
Michael S Deiner ... Urmimala Sarkar
JMIR infodemiology | VOL. 4
Michael S Deiner, et. al.Michael S Deiner ... Urmimala Sarkar
29 Aug 2024
JMIR infodemiology | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Leveraging Large Language Models for Decision Support in Personalized Oncology

Abstract

Talk to us

Similar Papers

More From: JAMA network open