Large language models for precision oncology: Clinical decision support through expert-guided learning.

Jacqueline Lammert,Alisa M Lörsch,Nicole Pfarr,Anna Durner,Carolin Mogler,Kristina Schwamborn,Johannes Jung,Marion B Kiechle,Maximilian Tschochohei,Tobias F Dreyer,Sebastian Lange,Ulrich A Schatz,Sonja Mathes,Anna Lena Illert,Christof Winter

doi:10.1200/jco.2024.42.16_suppl.e13609

Abstract

e13609 Background: Precision oncology revolutionized cancer treatment by identifying molecular biomarkers to guide personalized care. The ever-growing body of medical literature presents a challenge for oncologists researching targeted therapies. While recent studies investigated large language models (LLMs) to streamline this process, LLM reliance on general rather than medical knowledge limits clinical relevance and trustworthiness. To address these limitations, we developed a retrieval augmented generation (RAG) system that integrates PubMed clinical studies, trial databases and oncological guidelines with LLMs to support targeted treatment recommendations. The Molecular Tumor Board (MTB) at the Center of Personalized Medicine (ZPMTUM) guided and evaluated treatment options proposed by the LLM to assess their applicability for clinical decision support. Methods: We used 10 publicly accessible fictional patient cases with 7 tumor types and 59 distinct molecular alterations. Our LLM system MEREDITH (Medical Evidence Retrieval and Data Integration for Tailored Healthcare) consists of Google's Gemini Pro, enhanced with RAG and Chain-of-Thought (CoT) prompting. To establish a benchmark, clinical experts at ZPMTUM manually annotated the cases. Informed by MTB expert feedback, we iteratively improved our LLM system from a draft system relying on PubMed-indexed data to an enhanced system, which replicated expert annotation processes by incorporating oncology guidelines, drug availability and trial databases (ClinicalTrials.gov, QuickQueck.de). ZPMTUM assessed credibility and clinical relevance of manually annotated and LLM-generated recommendations. Patient-level data on (likely) pathogenic molecular alterations and recommended treatment options were summarized using median and interquartile range (IQR). Semantic similarity between LLM and clinician responses was assessed using cosine similarity of text vector embeddings; paired t-test evaluated significance. Results: The median of (likely) pathogenic molecular alterations per patient was 2.5 (IQR: 2-3). ZPMTUM identified a median of 2 treatment options per patient (IQR: 1-3), while the enhanced LLM identified a median of 4 (IQR: 3-5). MEREDITH proposed multiple relevant treatment suggestions, including therapies based on preclinical studies, and molecular interactions, for further assessment by the MTB. ZPMTUM prioritized the most suitable clinical option. The mean semantic textual similarity of LLM responses increased significantly from 0.69 in the draft system to 0.76 in the enhanced system (p <0.001). Thus, feedback from ZPMTUM enhanced the model's ability to align its responses with clinician thought processes. Conclusions: Leveraging expert thought processes to instruct LLMs holds promise as a novel decision support tool for precision oncology.

Full Text