Enhancing Clinical Trial Selection for Cancer Patients Using Large Language Models.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Identifying appropriate clinical trials for cancer patients with specific gene mutations remains a significant challenge, largely due to limitations in current search tools like ClinicalTrials.gov, which at times return irrelevant or misleading results. This diagnostic accuracy study investigates the efficacy of 2 large language models (LLMs), GPT-4.0 and Gemini 2.0, in evaluating the eligibility of patients with specific cancer-related gene mutations for clinical trials. The study prompts GPT 4.0 and Gemini 2.0 with trial details from ClinicalTrials.gov and a particular cancer mutation. We then assess model performance against physician-curated benchmarks across 6 gene mutations (ALK, BRAF, EGFR, ERBB2, KIT, and KRAS). The results demonstrate good F1-scores for both LLMs-averaging 64% for GPT-4.0 and 70% for Gemini 2.0-highlighting their potential to streamline clinical trial matching. Furthermore, decision trees provided interpretability by identifying key textual indicators that LLMs use. This work demonstrates the feasibility of using proprietary LLMs such as GPT 4.0 and Gemini 2.0 "off the shelf" with both limited LLM fine-tuning and limited patient information to evaluate clinical trial eligibility.

Similar Papers
  • Research Article
  • Cite Count Icon 38
  • 10.1053/j.gastro.2021.06.079
Diversity and Inclusion in Pancreatic Cancer Clinical Trials
  • Aug 17, 2021
  • Gastroenterology
  • Kelly M Herremans + 3 more

Diversity and Inclusion in Pancreatic Cancer Clinical Trials

  • Research Article
  • Cite Count Icon 4
  • 10.1007/s00520-023-08112-8
Status and influential factors of spiritual well-being in cancer patients with drug clinical trials: a cross-sectional study.
  • Oct 19, 2023
  • Supportive Care in Cancer
  • Xue Hu + 4 more

The purpose of this study was to investigate the spiritual well-being status of cancer patients in drug clinical trials and its influencing factors, and to provide theoretical support for the spiritual health intervention of clinical trial cancer patients. This cross-section study was conducted among 244 cancer patients in clinical trials. The Memorial Symptom Assessment Scale Short Form (MSAS-SF), Connor-Davidson Resilience Scale 10 (CD-RISC 10), and Functional Assessment of Chronic Illness Therapy-Spiritual (FACIT-SP-12) were used to measure symptom burden, psychological resilience, and spiritual well-being. The Multiple Linear Regression Model was used to determine the influencing factors of patients' spiritual health. The overall spiritual health level of cancer patients with clinical trials was high (36.87 ± 11.0), and the spiritual health level was positively correlated with psychological resilience (r = 0.872, P < 0.001). Religious belief, nationality, treatment regimen, and resilience were independent risk factors for the spiritual health of cancer patients in clinical trials. Patients with religious beliefs (β = 0.097, P = 0.012), ethnic minorities (β = 0.087, P = 0.023), and high resilience scores (β = 0.874, P < 0.001) had higher levels of spiritual health. Patients who received single antineoplastic therapy (β = - 0.079, P = 0.028) had lower levels of spiritual health. Our study found that the spiritual health of cancer patients in clinical trials was at a high level, superior to cancer patients receiving conventional anti-tumor therapy. Religious belief, nationality, treatment regimen, and psychological resilience were the influential factors of spiritual health.

  • Research Article
  • Cite Count Icon 1
  • 10.1200/jco.2025.43.16_suppl.e23161
Aiding data retrieval in clinical trials with large language models: The APOLLO 11 Consortium in advanced lung cancer patients.
  • Jun 1, 2025
  • Journal of Clinical Oncology
  • Federica Corso + 19 more

e23161 Background: Data retrieval is challenging in clinical research and traditional methods for data collection are often time-consuming and may be error-prone. Large Language Models (LLMs) have shown zero-shot capabilities in converting unstructured clinical text into structured data. These technologies could support the retrieval stage of clinical trials by leveraging the information reported in Electronic Health Records (EHRs) without relying any longer on manual curation. APOLLO 11 Consortium (NCT05550961) is a multicentric Italian trial which leverages a federated infrastructure for the analysis of advanced lung cancer patient data across Italy. Methods: We conducted a pilot study using Llama 3.1 8B on 358 Non-Small Cell Lung Cancer patients from the IRCCS Istituto Nazionale dei Tumori, leader of the APOLLO 11 Consortium. Anonymized EHRs have been analyzed within the LLM pipeline for feature extraction by Wiest et al. A combination of zero/few shot prompting techniques both in English and Italian languages was used. We selected smoking, histology, PD-L1 and staging as multiclass variables and bone/brain/liver metastases as binary variables. The ground truth collection involved a first Manual Data Entry (1-MDE) and a final full-revised MDE (2-MDE). The LLM accuracy was calculated only for the comparison LLM vs 2-MDE. In addition, we calculated the percentage of Missing Information (% MI) in 1-MDE, 2-MDE and LLM extraction. Results: Compared to 2-MDE, LLM achieved feature-specific accuracies of 0.78 for PD-L1, 0.85 for BONE METASTASIS, 0.83 for BRAIN METASTASIS, 0.89 for LIVER METASTASIS and 0.96 for TUMOUR STAGING. For smoking and staging, LLM extraction also reduced % MI relative to 1-MDE (Table 1). Only for PD-L1, we further analyzed the 12.8% of MI and found that 91.3% resulted from hallucinations (i.e., PD-L1 was misclassified as missing). Evaluations using English prompts confirmed the pipeline’s adaptability and high tasks accuracy. Conclusions: This study confirms the feasibility of LLMs for data retrieval in clinical trials demonstrating strong performance across diverse clinical features with minimal prompt optimization. LLMs could assist clinicians and data entry personnel in the 1-MDE process, streamlining initial data structuring and saving time. The 2-MDE step can remain as a quality check to address any discrepancies. Further improvements could focus on prompt optimization and integrating human feedback to reduce hallucination rates. Clinical trial information: NCT05550961 . %MI in 1-MDE, 2-MDE and LLM extraction. Accuracy refers only to LLM vs 2-MDE. Histology and metastasis sites were collected only in 2-MDE. NA = not available. Smoking PD-L1 Histology Bone Met Brain Met Liver Met T N M Stage % MI 1-MDE 6.4 8.9 NA NA NA NA 22.5 22.5 23.11 98.3 % MI 2-MDE 6.6 3 0 0 0 0 0 0 0 0 % MI LLM 2.7 12.8 10.3 0 0 0 0 0 0 6.9 % accuracy (LLM vs 2-MDE) 67 78 91 85 83 89 39 52 70 96

  • Research Article
  • Cite Count Icon 20
  • 10.1177/17407745251320806
From RAGs to riches: Utilizing large language models to write documents for clinical trials
  • Feb 27, 2025
  • Clinical Trials (London, England)
  • Nigel Markey + 4 more

Background/AimsClinical trials require numerous documents to be written: Protocols, consent forms, clinical study reports, and many others. Large language models offer the potential to rapidly generate first-draft versions of these documents; however, there are concerns about the quality of their output. Here, we report an evaluation of how good large language models are at generating sections of one such document, clinical trial protocols.MethodsUsing an off-the-shelf large language model, we generated protocol sections for a broad range of diseases and clinical trial phases. Each of these document sections we assessed across four dimensions: Clinical thinking and logic; Transparency and references; Medical and clinical terminology; and Content relevance and suitability. To improve performance, we used the retrieval-augmented generation method to enhance the large language model with accurate up-to-date information, including regulatory guidance documents and data from ClinicalTrials.gov. Using this retrieval-augmented generation large language model, we regenerated the same protocol sections and assessed them across the same four dimensions.ResultsWe find that the off-the-shelf large language model delivers reasonable results, especially when assessing content relevance and the correct use of medical and clinical terminology, with scores of over 80%. However, the off-the-shelf large language model shows limited performance in clinical thinking and logic and transparency and references, with assessment scores of ≈40% or less. The use of retrieval-augmented generation substantially improves the writing quality of the large language model, with clinical thinking and logic and transparency and references scores increasing to ≈80%. The retrieval-augmented generation method thus greatly improves the practical usability of large language models for clinical trial-related writing.DiscussionOur results suggest that hybrid large language model architectures, such as the retrieval-augmented generation method we utilized, offer strong potential for clinical trial-related writing, including a wide variety of documents. This is potentially transformative, since it addresses several major bottlenecks of drug development.

  • Research Article
  • Cite Count Icon 1
  • 10.1200/jco.2024.42.16_suppl.11081
Performance of a trained large language model to provide clinical trial recommendation in a head and neck cancer population.
  • Jun 1, 2024
  • Journal of Clinical Oncology
  • Tony Hung + 12 more

11081 Background: Chatbots based on large language model (LLM) have demonstrated ability to answer oncology exam questions; however, leveraging LLM in medical-decision support have not yet demonstrated suitable performance in oncology practice. We evaluated the performance of a trained a LLM, GPT-4, to recommend appropriate clinical trials for a head &amp; neck (HN) cancer population. Methods: In 2022, we developed an artificial intelligence powered clinical trial management mobile app, LookUpTrials, and demonstrated promising user engagement among oncologists. Using LookUpTrials database, we applied direct preference optimization to train GPT-4 as an in-app assistant to LookUpTrials. From Nov 7 to Dec 19, 2023, we collected consecutive, new patient cases and their respective clinical trial recommendations from oncologists in the HN medical oncology service at Memorial Sloan Kettering Cancer Center. Cases were categorized by diagnosis, cancer stage, treatment setting, and physician recommendation on clinical trials. Trained GPT-4 is prompted using a semi-structured template: “Given patient with a &lt;diagnosis&gt;, &lt;cancer stage&gt;, &lt;treatment setting&gt;, what are possible clinical trials?” Physician recommendations were compared with trained GPT-4 responses. We analyzed the performance of GPT-4 based on its response precision (positive predictive value), recall (sensitivity), and F1 score (harmonic mean of precision and recall). Results: We analyzed 178 patient cases, mean age 65.6 (SD 13.9), primarily male (75%) with local/locally advanced (68%) HN (61%), thyroid (16%), skin (9%), or salivary (8%) cancers. Majority were treated in the definitive setting with combined modality therapy (42%) and modest proportion were treated under clinical trials (10%). Overall, trained GPT-4 achieved a moderate performance matching physician clinical trial recommendations with 63% precision and 100% recall (F1 score 0.77), narrowing a total list of 56 HN clinical trials to a range of 0-4 relevant trials per patient case (mean 1, SD 1.2). Comparatively, performance of our trained GPT-4 exceeded historic performance of untrained LLMs to provide oncology treatment recommendation by 4-20 folds (F1 score 0.04 - 0.19). Conclusions: This proof-of-concept study demonstrated that trained LLM can achieve moderate performance in matching physician clinical trial recommendation in HN oncology. Our results suggest the potential of embedding trained LLM into oncology workflow to aid clinical trial search and accelerate clinical trial accrual. Future research is needed to optimize precision of trained LLM and to assess whether trained LLM may be a scalable solution to enhance the diversity and equity of clinical trial participation.

  • Research Article
  • 10.1200/jco.2025.43.16_suppl.e13627
Accuracy of large language models for matching cancer patients to biomarker-driven clinical trials based on molecular profiles.
  • Jun 1, 2025
  • Journal of Clinical Oncology
  • Nourya A Cohen + 2 more

e13627 Background: Clinical trials are an important treatment option for cancer patients, in particular if standard treatment modalities have been exhausted. With the availability of somatic genome profiling and targeted therapies an increasing number of biomarker-driven precision medicine trials are being conducted. It is a challenge even for experts to keep up with the number of relevant trials and to find all available options for a patient. In this study we explore whether large language models (LLM) could be used to augment the matching of trials to eligible patients based on their genomic profile and other information. Methods: We compiled a dataset of 678 full text descriptions of currently recruiting cancer clinical trials from clinicaltrials.gov and de-identified profiles from 100 patients with solid tumors. The profiles included basic demographics, a high-level diagnosis (e.g. lung adenocarcinoma) and somatic mutations from panel testing. We built an automated system to supply the trial and patient information to different LLMs, prompt the models to suggest suitable trials for each patient and automatically retrieve the results. We benchmarked the accuracy of the suggested trials with a manually curated ground truth dataset of 1107 positive patient-trial matches. Limitations of current LLMs are that they are not trained on real-time data (i.e. up-to-date clinical trial information) and have a limited context window size for providing trial data in the query. We therefore tried different strategies for condensing and pre-filtering the trial data: 1. Summarizing the full-text trial description into a single page using an LLM (chatGPT), 2. pre-filtering the trials by keyword search for the primary site of the patient (e.g. lung). Results: Of the LLMs we compared, Gemini had the highest accuracy across the 100 patient profiles (Sensitivity 45%, Specificity 22%). Without pre-filtering for primary site, sensitivity increased to 50% while specificity decreased to 16%, likely because keyword searching eliminates some valid trials with broad criteria (e.g. solid tumors). When providing trial data in full-text (not summarized), the accuracy decreased (Sn 37%, Sp 14% with pre-filtering, Sn 30%, Sp 5% without pre-filtering), suggesting that summarization is beneficial and too much information dilutes the matching. Common error modes included conflating lexicographically similar but clinically distinct entities, such as KRAS vs. NRAS and G12C vs. G12D, reflecting the probabilistic rather than exact matching behavior of LLMs. Conclusions: It can be assumed that patients will use available resources, including publicly available LLMs to search for treatment options. Our results show that LLM based approaches can find relevant potential trial options but are not comprehensive and also include a large number of false positives and therefore need to be interpreted with caution.

  • Front Matter
  • 10.1053/j.jfas.2011.09.001
Mitigating Administrative Risks in Industry-sponsored Clinical Trials
  • Sep 10, 2011
  • The Journal of Foot and Ankle Surgery
  • Paul J Kim

Mitigating Administrative Risks in Industry-sponsored Clinical Trials

  • Abstract
  • 10.1093/annonc/mdz423.001
76P - Perception and satisfaction of cancer patients in clinical trials
  • Nov 1, 2019
  • Annals of Oncology
  • J Jeon + 2 more

76P - Perception and satisfaction of cancer patients in clinical trials

  • Research Article
  • Cite Count Icon 2
  • 10.1016/j.ijmedinf.2024.105746
CPRS:A Clinical Protocol Recommendation System based on LLMs
  • Mar 1, 2025
  • International Journal of Medical Informatics
  • Jingkai Ruan + 4 more

CPRS:A Clinical Protocol Recommendation System based on LLMs

  • Research Article
  • 10.1200/jco.2025.43.16_suppl.e12610
From trials to clinics: Real-world outcomes of neoadjuvant HER2 directed therapy in early stage breast cancer using AI-enhanced data pipelines.
  • Jun 1, 2025
  • Journal of Clinical Oncology
  • Jim Zhongning Chen + 6 more

e12610 Background: Human Epidermal Growth Factor Receptor 2 (HER2)–positive breast cancers were historically associated with more aggressive disease before HER2-targeted therapies. Neoadjuvant combination regimens such as TCHP (docetaxel, carboplatin, trastuzumab, and pertuzumab) have demonstrated superior efficacy in clinical trials, achieving high pathological complete response (pCR) rates. However, translating these trial outcomes to real-world community settings remains challenging. Real-world evidence (RWE) outside of the controlled trial environment is limited, partly due to the labor-intensive nature of manually abstracting data from unstructured clinical notes. Scalable solutions, such as large language models (LLMs), offer a promising alternative by enabling more efficient data extraction. In this study, we utilized LLMs to extract and analyze clinical data, comparing outcomes in patients treated within the American Oncology Network (AON) to those reported in clinical trials. Methods: We conducted a retrospective study of patient records from AON. Patients with HER2-positive invasive ductal carcinoma (stages I–III) diagnosed between January 2018, and March 2024, who received neoadjuvant TCHP, were included. Clinical data were obtained from structured fields and manual chart abstraction and validated by oncology experts. Concurrently, we developed a locally hosted, quantized LLM pipeline to parse unstructured physician notes for key variables, including treatment timing, neoadjuvant intent, HER2-targeted therapy, and pCR status. Patients pCR rates were contrasted with published clinical trial results using chi-squared test. Results: A total of 335 eligible patients were identified from multiple community oncology clinics across 20 states. Stage I, II, and III disease accounted for 18%, 58%, and 24% of cases, respectively. The overall pCR rate was 52.63%, aligning with pCR rates reported in clinical trials such as KRISTINE, NEOSPHERE, PEONY, and TRYPHAENA. LLM-based abstraction achieved 96% accuracy for determining pCR and 87% for identifying neoadjuvant therapy details, reducing manual review time by over 98%. Conclusions: This retrospective analysis demonstrates that HER2-positive breast cancer patients who received neoadjuvant TCHP within AON achieved pCR rates comparable to those reported in clinical trials, confirming its efficacy in real-world populations. Additionally, the integration of LLMs significantly reduced labor and time enabling a more efficient and scalable approach to RWE studies. These findings underscore the transformative potential of AI in advancing cancer research. Comparison of pCR between AON and clinical trials. Trial/RWE Sample Size Pathological Complete Response P-value AON 115 52.6% * KRISTINE 222 55.7% 0.47 NeoSphere 107 45.8% 0.23 PEONY 218 39.5% 0.72 TRYPHENA 77 66.2% 0.08

  • Research Article
  • Cite Count Icon 1
  • 10.1097/mou.0000000000001281
Large language models for automating clinical trial matching.
  • Mar 20, 2025
  • Current opinion in urology
  • Ethan Layne + 6 more

The uses of generative artificial intelligence (GAI) technologies in medicine are expanding, with the use of large language models (LLMs) for matching patients to clinical trials of particular interest. This review provides an overview of the current ability of leveraging LLMs for clinical trial matching. This review article examines recent studies assessing the performance of LLMs in oncologic clinical trial matching. The research in this area has shown promising results when testing these system using artificially created datasets. In general, they looked at how LLMs can be used to match patient health records with clinical trial eligibility criteria. There is still a need for human oversight of the systems in their current state. Automated clinical trial matching can improve patient access and autonomy, reduce provider workload, and increase trial enrollment. However, it may potentially create a feeling of "false hope" for patients, can be difficult to navigate, and still requires human oversight. Providers may face a learning curve, while institutions must address data privacy concerns and ensure seamless EMR/EHR integration. Given this, additional studies are needed to ensure safety and efficacy of LLM-based clinical trial matching in oncology.

  • Research Article
  • 10.1200/op.2025.21.10_suppl.621
Design and feasibility of lay clinical trial summaries using large language models.
  • Oct 1, 2025
  • JCO Oncology Practice
  • Brenda Adjei + 3 more

621 Background: Effective communication between clinical researchers and participants is vital for successful clinical trials. However, informed consent documents often contain complex language, making it difficult for many patients to fully understand key information and make informed decisions. This comprehension gap impacts patient experience and impedes trial enrollment. Large Language Models (LLMs) show promise for translating specialized medical content into accessible language; however, their effectiveness in clinical trial communication remains underexplored. Multimodal approaches combining LLM-generated content with visual aids may significantly enhance participant understanding and trust in the consent process. This study aims to design and test the feasibility of creating a lay clinical trial summary using LLM technology. Methods: This feasibility pilot leveraged an LLM to extract and customize key clinical trial information from research protocols for prospective participants. An initial literature review leveraged FDA guidance and prior work by Hill et al (2024) to establish a template and delineate content requirements. We utilized Amazon Bedrock with Anthropic's Claude 3.7 to engineer and test prompts. We extracted each information topic separately into phrases or short sentences in JavaScript Object Notation (JSON) and used placeholders to insert content into the template. Clinical experts, communication specialists, and patient advocates reviewed content for accuracy and clarity. Results: Feedback received from patient advocates were incorporated to optimize content relevance, literacy level, and acceptability before producing 2 final outputs for IRB review. Iterative prompt optimization using one-to-two shot examples in conversational, second-person language achieved optimal LLM outputs at 4-6 grade reading level. The template was converted from PowerPoint shapes to a Word table for improved text wrapping, visual presentation, and Section 508 accessibility compliance. Human oversight remained essential for content validation and managing text constraints within fixed cell dimensions. Patient reviewers provided highly positive feedback, endorsing bullet points, simplified study titles, and enhanced typography with wider spacing. Graphics and layout enhancements significantly improved engagement compared to traditional consent materials. Final outputs, after human review, were submitted to IRB for approval. Conclusions: Utilizing an LLM increased efficiency in creating summaries, provided consistency in language and structure, and customized trial content. Implementation required balancing risks of oversimplification against benefits, maintaining human oversight for critical evaluation, and ensuring regulatory compliance. A future study will evaluate participant preference for layperson abstract plus consent form versus consent form alone.

  • Research Article
  • Cite Count Icon 3
  • 10.3779/j.issn.1009-3419.2020.01.07
Acceptance and Related Causes of Clinical Trials among Cancer Patients in China
  • Jan 20, 2020
  • Chinese Journal of Lung Cancer
  • Huiyao Huang + 14 more

背景与目的我国的抗肿瘤新药临床试验开展如火如荼,患者的临床试验接受度是影响临床试验开展速度和质量的重要因素。既往研究仅针对未参与过临床试验的肿瘤患者接受度展开调查,未分析参加过临床试验患者的相关情况。本研究调查并比较参加过和未参与过试验的肿瘤患者对于临床试验的接受度,并分析相关原因。方法2018年6月-2019年4月,采用标准化问卷在中国医学科学院肿瘤医院针对肿瘤患者(参加过vs未参加过试验)开展调查,分析比较两组患者临床试验的接受度和差异,并分析主要原因及医生对其接受度的影响。结果共纳入538例患者,男性51.1%,平均年龄53.5岁,43.3%的患者参加过试验。总体而言,502例(93.3%)患者愿意参加或推荐亲友参加试验,参加过试验的患者接受度较高(96.6% vs 90.8%, P=0.008)。参加过和未参加过试验患者愿意的最主要原因均为“期待最佳治疗效果”(100.0% vs 99.3%),次要原因分别为“可减轻经济负担”(56.0%)和“主治医生建议”(43.7%)。参加过试验的患者不愿意参加的主要原因为“放弃其他治疗选择”、“分到对照治疗组”或“额外访视影响生活”;未参加过试验的患者为“治疗效果不佳”或“出现严重不良反应”。对参加过试验的患者,医生推荐对88%患者参与试验的决策起到关键作用;对未参加过试验的患者,医生推荐可使60.9%无参与意愿者改变其选择。研究也报告了患者对临床试验获取信息和途径等的倾向选择。结论肿瘤患者临床试验接受度普遍较高,尤其是参加过试验的患者。充分发挥主治医生的作用对提高我国肿瘤患者临床试验接受度有重要意义。

  • Research Article
  • 10.1016/j.surg.2025.109915
Understanding unrealized trial enrollments following patient-to-trial matching with large language models.
  • Mar 1, 2026
  • Surgery
  • Claire T Verhagen + 10 more

Understanding unrealized trial enrollments following patient-to-trial matching with large language models.

  • Front Matter
  • Cite Count Icon 27
  • 10.1136/esmoopen-2020-000924
Clinical research disruption in the post-COVID-19 era: will the pandemic lead to change?
  • Jan 1, 2020
  • ESMO Open
  • Domenica Lorusso + 3 more

The unprecedented situation we are facing has strongly disrupted the clinical research rules. Nevertheless, for the scientific community, it may represent the opportunity to learn important lessons. The COVID-19 pandemic suggests that it is possible to alleviate redundancy in clinical trials, and while preserving the rigour of a study, can offer a new, less burdened and more inclusive vision of clinical research for the scientific community of tomorrow. This perspective article describes clinicians’ vision of how the pandemic could change the roles of clinical research. Since the beginning of the SARS-COV2 outbreak in Wuhan, more than 24 million people have been infected all around the world and more than 800 000 have died from the disease so far. In this scenario, Europe is facing one of the worst crises that our National Health Systems have ever encountered in the last 50 years. Six months after the first COVID-19 diagnosis, the lockdown is being eased in European countries and our lives are slowly adapting to ‘a new normality’. Providing care to immunocompromised patients with cancer during this pandemic has been extremely challenging and oncologists face many challenges in providing cancer care during the COVID-19 outbreak. Data from China reported that patients with cancer who are infected with COVID-19 are at 3.5 times the risk of requiring mechanical ventilation or intensive care unit (ICU) admission, compared with the general population.1 Additionally, the limitation of resources in outpatient settings, including administrative staff and specialists, has hindered the routine care of patients.2 National and international cancer societies published priority-driven guidelines for the management of oncohaematological patients on therapy during the COVID-19 pandemic and recommended considering treatment delays and modifications on a case-by-case basis, taking into account the characteristics of the patient and the disease.3 In addition to routine patient care, the imperative of …

Save Icon
Up Arrow
Open/Close