GDPKG-LLM: Integrating Gene, Disease, and Pharmacogenomics Knowledge Graphs for Cognitive Neuroscience Using Large Language Models

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Using the structures of large language models (LLMs) in creating knowledgediagrams to understand more about the relationship between the entities ofcognitive and biological sciences has become a hot point of research. Due to thegreat knowledge behind the curtain and the deep connections of this research,it is not possible to use the traditional approaches of machine learning and deeplearning. In this study,the main goal is to create a comprehensive and integratedknowledge graph(KG) from the combination of three knowledge sources: GeneOntology (GO), Disease Ontology (DO), and PharmKG. Large language models(LLMs) have been used to create this knowledge base. The main purpose ofthis KG is to understand the relationships between genes, diseases and drugs.The pro- posed approach was called GDPKG-LLM. It has several key steps,including entity matching, similarity analysis, graph alignment and using GPT-4. GDPKG-LLM was able to extract more than 16,800 nodes and 838,000 edgesfrom these three knowledge bases and provide a rich KG. This graph providesmeaningful relationships, making it a valuable resource for future research inpersonalized medicine and neuroscience. The reviewed evaluation criteria showthe superiority of GDPKG-LLM, which strengthens the validity of this model.

Similar Papers
  • PDF Download Icon
  • Preprint Article
  • 10.2196/preprints.68320
Knowledge Enhancement of Small-Scale Models in Medical Question Answering (Preprint)
  • Nov 3, 2024
  • Xinbai Li + 3 more

BACKGROUND Medical question answering (QA) is essential for various medical applications. While small-scale pre-training language models (PLMs) are widely adopted in open-domain QA tasks through fine-tuning with related datasets, applying this approach in the medical domain requires significant and rigorous integration of external knowledge. Knowledge-enhanced small-scale PLMs have been proposed to incorporate knowledge bases (KBs) to improve performance, as KBs contain vast amounts of factual knowledge. Large language models (LLMs) contain a vast amount of knowledge and have attracted significant research interest due to their outstanding natural language processing (NLP) capabilities. KBs and LLMs can provide external knowledge to enhance small-scale models in medical QA. OBJECTIVE KBs consist of structured factual knowledge that must be converted into sentences to align with the input format of PLMs. However, these converted sentences often lack semantic coherence, potentially causing them to deviate from the intrinsic knowledge of KBs. LLMs, on the other hand, can generate natural, semantically rich sentences, but they may also produce irrelevant or inaccurate statements. Retrieval-augmented generation (RAG) paradigm enhances LLMs by retrieving relevant information from an external database before responding. By integrating LLMs and KBs using the RAG paradigm, it is possible to generate statements that combine the factual knowledge of KBs with the semantic richness of LLMs, thereby enhancing the performance of small-scale models. In this paper, we explore a RAG fine-tuning method, RAG-mQA, that combines KBs and LLMs to improve small-scale models in medical QA. METHODS In the RAG fine-tuning scenario, we adopt medical KBs as an external database to augment the text generation of LLMs, producing statements that integrate medical domain knowledge with semantic knowledge. Specifically, KBs are used to extract medical concepts from the input text, while LLMs are tasked with generating statements based on these extracted concepts. In addition, we introduce two strategies for constructing knowledge: KB-based and LLM-based construction. In the KB-based scenario, we extract medical concepts from the input text using KBs and convert them into sentences by connecting the concepts sequentially. In the LLM-based scenario, we provide the input text to an LLM, which generates relevant statements to answer the question. For downstream QA tasks, the knowledge produced by these three strategies is inserted into the input text to fine-tune a small-scale PLM. F1 and exact match (EM) scores are employed as evaluation metrics for performance comparison. Fine-tuned PLMs without knowledge insertion serve as baselines. Experiments are conducted on two medical QA datasets: emrQA (English) and MedicalQA (Chinese). RESULTS RAG-mQA achieved the best results on both datasets. On the MedicalQA dataset, compared to the KB-based and LLM-based enhancement methods, RAG-mQA improved the F1 score by 0.59% and 2.36%, and the EM score by 2.96% and 11.18%, respectively. On the emrQA dataset, the EM score of RAG-mQA exceeded those of the KB-based and LLM-based methods by 4.65% and 7.01%, respectively. CONCLUSIONS Experimental results demonstrate that RAG fine-tuning method can improve the model performance in medical QA. RAG-mQA achieves greater improvements compared to other knowledge-enhanced methods. CLINICALTRIAL This study does not involve trial registration.

  • Research Article
  • 10.1080/13658816.2025.2577252
Extraction of geoprocessing modeling knowledge from crowdsourced Google Earth Engine scripts by coordinating large and small language models
  • Nov 1, 2025
  • International Journal of Geographical Information Science
  • Anqi Zhao + 7 more

The widespread use of online geoinformation platforms, such as Google Earth Engine (GEE), has produced numerous scripts. Extracting domain knowledge from these crowdsourced scripts supports understanding of geoprocessing workflows. Small Language Models (SLMs) are effective for semantic embedding but struggle with complex code; Large Language Models (LLMs) can summarize scripts, yet lack consistent geoscience terminology to express knowledge. In this paper, we propose Geo-CLASS, a knowledge extraction framework for geospatial analysis scripts that coordinates large and small language models. Specifically, we designed domain-specific schemas and a schema-aware prompt strategy to guide LLMs to generate and associate entity descriptions, and employed SLMs to standardize the outputs by mapping these descriptions to a constructed geoscience knowledge base. Experiments on 237 GEE scripts, selected from 295,943 scripts in total, demonstrated that our framework outperformed LLM baselines, including Llama-3, GPT-3.5 and GPT-4o. In comparison, the proposed framework improved accuracy in recognizing entities and relations by up to 31.9% and 12.0%, respectively. Ablation studies and performance analysis further confirmed the effectiveness of key components and the robustness of the framework. Geo-CLASS has the potential to enable the construction of geoprocessing modeling knowledge graphs, facilitate domain-specific reasoning and advance script generation via Retrieval-Augmented Generation (RAG).

  • Research Article
  • Cite Count Icon 17
  • 10.1016/j.omtn.2024.102255
Large language model to multimodal large language model: A journey to shape the biological macromolecules to biological sciences and medicine
  • Jun 15, 2024
  • Molecular Therapy - Nucleic Acids
  • Manojit Bhattacharya + 4 more

Large language model to multimodal large language model: A journey to shape the biological macromolecules to biological sciences and medicine

  • Research Article
  • Cite Count Icon 33
  • 10.1001/jamaophthalmol.2024.2513
Development and Evaluation of a Retrieval-Augmented Large Language Model Framework for Ophthalmology
  • Jul 18, 2024
  • JAMA Ophthalmology
  • Ming-Jie Luo + 13 more

Although augmenting large language models (LLMs) with knowledge bases may improve medical domain-specific performance, practical methods are needed for local implementation of LLMs that address privacy concerns and enhance accessibility for health care professionals. To develop an accurate, cost-effective local implementation of an LLM to mitigate privacy concerns and support their practical deployment in health care settings. ChatZOC (Sun Yat-Sen University Zhongshan Ophthalmology Center), a retrieval-augmented LLM framework, was developed by enhancing a baseline LLM with a comprehensive ophthalmic dataset and evaluation framework (CODE), which includes over 30 000 pieces of ophthalmic knowledge. This LLM was benchmarked against 10 representative LLMs, including GPT-4 and GPT-3.5 Turbo (OpenAI), across 300 clinical questions in ophthalmology. The evaluation, involving a panel of medical experts and biomedical researchers, focused on accuracy, utility, and safety. A double-masked approach was used to try to minimize bias assessment across all models. The study used a comprehensive knowledge base derived from ophthalmic clinical practice, without directly involving clinical patients. LLM response to clinical questions. Accuracy, utility, and safety of LLMs in responding to clinical questions. The baseline model achieved a human ranking score of 0.48. The retrieval-augmented LLM had a score of 0.60, a difference of 0.12 (95% CI, 0.02-0.22; P = .02) from baseline and not different from GPT-4 with a score of 0.61 (difference = 0.01; 95% CI, -0.11 to 0.13; P = .89). For scientific consensus, the retrieval-augmented LLM was 84.0% compared with the baseline model of 46.5% (difference = 37.5%; 95% CI, 29.0%-46.0%; P < .001) and not different from GPT-4 with a value of 79.2% (difference = 4.8%; 95% CI, -0.3% to 10.0%; P = .06). Results of this quality improvement study suggest that the integration of high-quality knowledge bases improved the LLM's performance in medical domains. This study highlights the transformative potential of augmented LLMs in clinical practice by providing reliable, safe, and practical clinical information. Further research is needed to explore the broader application of such frameworks in the real world.

  • Research Article
  • Cite Count Icon 2
  • 10.3390/app15137227
Large-Language-Model-Enabled Text Semantic Communication Systems
  • Jun 26, 2025
  • Applied Sciences
  • Zhenyi Wang + 6 more

Large language models (LLMs) have recently demonstrated state-of-the-art performance in various natural language processing (NLP) tasks, achieving near-human levels in multiple language understanding challenges and aligning closely with the core principles of semantic communication Inspired by LLMs’ advancements in semantic processing, we propose LLM-SC, an innovative LLM-enabled semantic communication system framework which applies LLMs directly to the physical layer coding and decoding for the first time. By analyzing the relationship between the training process of LLMs and the optimization objectives of semantic communication, we propose training a semantic encoder through LLMs’ tokenizer training and establishing a semantic knowledge base via the LLMs’ unsupervised pre-training process. This knowledge base facilitates the creation of optimal decoder by providing the prior probability of the transmitted language sequence. Based on this, we derive the optimal decoding criteria for the receiver and introduce beam search algorithm to further reduce complexity. Furthermore, we assert that existing LLMs can be employed directly for LLM-SC without extra re-training or fine-tuning. Simulation results reveal that LLM-SC outperforms conventional DeepSC at signal-to-noise ratios (SNRs) exceeding 3 dB, as it enables error-free transmissions of semantic information under high SNRs while DeepSC fails to do so. In addition to semantic-level performance, LLM-SC demonstrates compatibility with technical-level performance, achieving approximately an 8 dB coding gain for a bit error ratio (BER) of 10−3 without any channel coding while maintaining the same joint source–channel coding rate as traditional communication systems.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 68
  • 10.1038/s41746-024-01024-9
CancerGPT for few shot drug pair synergy prediction using large pretrained language models
  • Feb 19, 2024
  • NPJ Digital Medicine
  • Tianhao Li + 6 more

Large language models (LLMs) have been shown to have significant potential in few-shot learning across various fields, even with minimal training data. However, their ability to generalize to unseen tasks in more complex fields, such as biology and medicine has yet to be fully evaluated. LLMs can offer a promising alternative approach for biological inference, particularly in cases where structured data and sample size are limited, by extracting prior knowledge from text corpora. Here we report our proposed few-shot learning approach, which uses LLMs to predict the synergy of drug pairs in rare tissues that lack structured data and features. Our experiments, which involved seven rare tissues from different cancer types, demonstrate that the LLM-based prediction model achieves significant accuracy with very few or zero samples. Our proposed model, the CancerGPT (with ~ 124M parameters), is comparable to the larger fine-tuned GPT-3 model (with ~ 175B parameters). Our research contributes to tackling drug pair synergy prediction in rare tissues with limited data, and also advancing the use of LLMs for biological and medical inference tasks.

  • Research Article
  • 10.18523/2617-3808.2024.7.98-101
Modern Approaches to Using Knowledge Bases to Address the Challenges of Large Language Models
  • May 12, 2025
  • NaUKMA Research Papers. Computer Science
  • Maksym Androshchuk

This paper examines the potential of integrating Large Language Models (LLMs) with knowledge bases to imrpove the accuracy and reliability of their responses. The advantages of such a combination are evaluated, particularly in reducing the risk of hallucinations – the phenomenon where models generate erroneous or fabricated information. Various methodologies for combining LLMs with knowledge bases are analyzed, along with their respective advantages and limitations. The prospects and challenges of implementing this technology in diverse fields—such as information retrieval, decision support, and automated content creation—are discussed. The paper presents an overview of the current state of research in this domain and delineates directions for future investigation.The integration of LLMs with knowledge bases represents a significant advancement in artificial intelligence, aiming one of the key concerns regarding LLMs—their tendency to generate inaccurate or fabricated information, commonly referred to as hallucinations. This approach leverages the vast language understanding and generation capabilities of LLMs while grounding their outputs in structured and verified information from knowledge bases. The synergy between these two technologies has the potential to significantly enhance the reliability and factual accuracy of AI-generated responses across a wide range of applications. The methodologies for combining LLMs with knowledge bases differ in their implementation and effectiveness. Some approaches involve pre-training LLMs on curated knowledge bases, while others reference knowledge bases externally during the inference process. Each method presents its own set of advantages and challenges, such as balancing computational efficiency against accuracy and maintaining the fluency of LLM outputs while adhering strictly to factual information. The application of this integrated technology extends beyond mere information retrieval, showing promise in complex decision support systems, automated content creation for specialized domains, and contributing to the advancement of explainable AI by providing traceable sources for generated information. As research in this area progresses, it is expected to open new avenues for developing more trustworthy and capable AI systems across various industries and academic disciplines.

  • Research Article
  • Cite Count Icon 6
  • 10.1016/j.artmed.2025.103078
Empowering large language models for automated clinical assessment with generation-augmented retrieval and hierarchical chain-of-thought.
  • Apr 1, 2025
  • Artificial intelligence in medicine
  • Zhanzhong Gu + 3 more

Empowering large language models for automated clinical assessment with generation-augmented retrieval and hierarchical chain-of-thought.

  • Research Article
  • 10.1371/journal.pone.0339594
Protocol for a scoping review examining the application of large language models in healthcare education and public health learning spaces
  • Jan 2, 2026
  • PLOS One
  • Henry Ndukwe + 1 more

ObjectiveThrough this scoping review, we aim to explore and synthesize existing knowledge and evidence on the learning approaches for incorporating LLMs into healthcare education and public health research and learning spaces. Specifically, we will attempt to investigate methods for auditing prompts for accuracy, fairness, and effectiveness; tailoring prompts to improve task-specific accuracy and utility; and exploring how end-user feedback is used to refine and optimize LLM prompts over time. This review will provide a comprehensive understanding of how LLMs are being tailored and improved in these fields, contributing to the development of evidence-based strategies for their implementation. It will also identify areas for future research and innovation.IntroductionThe increasing integration of large language models (LLMs) into healthcare education and public health research and learning spaces, highlights their potential to revolutionize service delivery, decision-making, and ultimately patient care and outcomes. Despite these advancements, understanding how LLMs can be effectively tailored, audited, and refined for learning remains a critical area of inquiry. Key issues include, the accuracy of generated information, and their relevance to the medical and public health fields.Inclusion criteriaOur focus will be on studies addressing LLM applications in healthcare education and public health research and learning spaces, prompt engineering techniques, prompt auditing methods, and processes geared towards integrating user feedback. Articles that do not focus on healthcare or public health contexts and lack relevance to LLM learning approaches will be excluded.MethodsThe review is guided by the JBI methodology for scoping reviews complemented by updates from Levac et al. Databases including PubMed, Scopus, IEEE Xplore, and Web of Science will be searched for peer-reviewed articles, conference proceedings, and grey literature published in English and French from 2015 to 2025. Data extraction will include information on study characteristics, LLM models, prompt engineering strategies, auditing methodologies, and user feedback mechanisms. We will synthesize to identify trends, gaps, and best practices in leveraging LLMs to generate baseline data for auditing prompts that optimize AI learning and education needs in the healthcare and public health sector.

  • Preprint Article
  • Cite Count Icon 3
  • 10.7490/f1000research.1120059.1
TALISMAN: Gene set summarization using Large Language Models
  • Dec 16, 2024
  • Marcin P Joachimiak + 4 more

Molecular biologists frequently interpret gene lists derived from high-throughput experiments and computational analysis. This is typically done as a statistical enrichment analysis that measures the over- or under-representation of biological function terms associated with genes or their properties, based on curated assertions from a knowledge base (KB) such as the Gene Ontology (GO). Interpreting gene lists can also be framed as a textual summarization task, enabling Large Language Models (LLMs) to use scientific texts directly and avoid reliance on a KB. TALISMAN (Terminological ArtificiaL Intelligence SuMmarization of Annotation and Narratives) uses generative AI to perform gene set function summarization as a complement to standard enrichment analysis. This method can use different sources of gene functional information: (1) structured text derived from curated ontological KB annotations, (2) ontology-free narrative gene summaries, or (3) direct retrieval from the model. We demonstrate that these methods are able to generate plausible and biologically valid summary GO term lists for an input gene set. However, LLM-based approaches are unable to deliver reliable scores or p-values and often return terms that are not statistically significant. Crucially, in our experiments these methods were rarely able to recapitulate the most precise and informative term from standard enrichment analysis. We also observe minor differences depending on prompt input information, with GO term descriptions leading to higher recall but lower precision. However, newer LLM models perform statistically significantly better than the oldest model across all performance metrics, suggesting that future models may lead to further improvements. Overall, the results are nondeterministic, with minor variations in prompt resulting in radically different term lists, true to the stochastic nature of LLMs. Our results show that at this point, LLM-based methods are unsuitable as a replacement for standard term enrichment analysis, however they may provide summarization benefits for implicit knowledge integration across extant but unstandardized knowledge, for large sets of features, and where the amount of information is difficult for humans to process.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 8
  • 10.37943/15xndz6667
DEVELOPMENT OF A QUESTION ANSWERING CHATBOT FOR BLOCKCHAIN DOMAIN
  • Sep 30, 2023
  • Scientific Journal of Astana IT University
  • Aigerim Mansurova + 2 more

Large Language Models (LLMs), such as ChatGPT, have transformed the field of natural language processing with their capacity for language comprehension and generation of human-like, fluent responses for many downstream tasks. Despite their impressive capabilities, they often fall short in domain-specific and knowledge-intensive domains due to a lack of access to relevant data. Moreover, most state-of-art LLMs lack transparency as they are often accessible only through APIs. Furthermore, their application in critical real-world scenarios is hindered by their proclivity to produce hallucinated information and inability to leverage external knowledge sources. To address these limitations, we propose an innovative system that enhances LLMs by integrating them with an external knowledge management module. The system allows LLMs to utilize data stored in vector databases, providing them with relevant information for their responses. Additionally, it enables them to retrieve information from the Internet, further broadening their knowledge base. The research approach circumvents the need to retrain LLMs, which can be a resource-intensive process. Instead, it focuses on making more efficient use of existing models. Preliminary results indicate that the system holds promise for improving the performance of LLMs in domain-specific and knowledge-intensive tasks. By equipping LLMs with real-time access to external data, it is possible to harness their language generation capabilities more effectively, without the need to continually strive for larger models.

  • Research Article
  • Cite Count Icon 5
  • 10.1093/jamia/ocaf059
Detecting emergencies in patient portal messages using large language models and knowledge graph-based retrieval-augmented generation.
  • Apr 12, 2025
  • Journal of the American Medical Informatics Association : JAMIA
  • Siru Liu + 5 more

This study aims to develop and evaluate an approach using large language models (LLMs) and a knowledge graph to triage patient messages that need emergency care. The goal is to notify patients when their messages indicate an emergency, guiding them to seek immediate help rather than using the patient portal, to improve patient safety. We selected 1020 messages sent to Vanderbilt University Medical Center providers between January 1, 2022 and March 7, 2023. We developed four models to triage these messages for emergencies: (1) Prompt-Only: the patient message was input with a prompt directly into the LLM; (2) Naïve Retrieval Augmented Generation (RAG): provided retrieved information as context to the LLM; (3) RAG from Knowledge Graph with Local Search: a knowledge graph was used to retrieve locally relevant information based on semantic similarities; (4) RAG from Knowledge Graph with Global Search: a knowledge graph was used to retrieve globally relevant information through hierarchical community detection. The knowledge base was a triage book covering 225 protocols. The RAG from Knowledge Graph model with global search outperformed other models, achieving an accuracy of 0.99, a sensitivity of 0.98, and a specificity of 0.99. It demonstrated significant improvements in triaging emergency messages compared to LLM without RAG and naïve RAG. The traditional LLM without any retrieval mechanism underperformed compared to models with RAG, which aligns with the expected benefits of augmenting LLMs with domain-specific knowledge sources. Our results suggest that providing external knowledge, especially in a structured manner and in community summaries, can improve LLM performance in triaging patient portal messages. LLMs can effectively assist in triaging emergency patient messages after integrating with a knowledge graph about a nurse triage book. Future research should focus on expanding the knowledge graph and deploying the system to evaluate its impact on patient outcomes.

  • Research Article
  • Cite Count Icon 5
  • 10.1016/j.birob.2024.100187
Leveraging large language models for comprehensive locomotion control in humanoid robots design
  • Oct 16, 2024
  • Biomimetic Intelligence and Robotics
  • Shilong Sun + 4 more

This paper investigates the utilization of large language models (LLMs) for the comprehensive control of humanoid robot locomotion. Traditional reinforcement learning (RL) approaches for robot locomotion are resource-intensive and rely heavily on manually designed reward functions. To address these challenges, we propose a method that employs LLMs as the primary designer to handle key aspects of locomotion control, such as trajectory planning, inverse kinematics solving, and reward function design. By using user-provided prompts, LLMs generate and optimize code, reducing the need for manual intervention. Our approach was validated through simulations in Unity, demonstrating that LLMs can achieve human-level performance in humanoid robot control. The results indicate that LLMs can simplify and enhance the development of advanced locomotion control systems for humanoid robots.

  • Research Article
  • Cite Count Icon 1
  • 10.2196/66503
Performance Assessment of ChatGPT-4.0 and ChatGLM Series in Traditional Chinese Medicine for Metabolic Associated Fatty Liver Disease: Comparative Study
  • Aug 25, 2025
  • JMIR Formative Research
  • Xionghui Wang + 5 more

BackgroundChatGPT-4.0 and the ChatGLM series are novel conversational large language models (LLMs). ChatGLM includes 3 versions: ChatGLM4 (with internet connectivity but no knowledge base pretraining), ChatGLM4+Knowledge base (combining internet search capabilities with knowledge base pretraining), ChatGLM3-6B (offline knowledge base pretraining but no internet connectivity). The ability of ChatGPT-4.0 and ChatGLM to apply medical knowledge in the Chinese environment has been preliminarily verified, but the potential of the 2 models for clinical assistance in traditional Chinese medicine (TCM) is still unknown.ObjectiveThis study aims to explore the performance of ChatGPT-4.0, ChatGLM4, ChatGLM4+Knowledge base, and ChatGLM3-6B in providing AI-assisted diagnosis and treatment for metabolic dysfunction-associated fatty liver disease within a TCM clinical framework, thereby assessing their potential as TCM clinical decision support tools.MethodsThis study evaluated 4 LLMs by providing them with medical records of 87 metabolic dysfunction-associated fatty liver disease cases treated with TCM and querying them about TCM treatment plans. The answering texts from 4 LLMs were evaluated using predefined scoring criteria, focusing on 3 critical dimensions: ability in syndrome differentiation and treatment principles, confusion of concepts between TCM and Western medicine, and comprehensive evaluation of question-answering texts (comprising 6 components: ability to integrate Chinese and Western medicine, ability to formulate treatment plans, health management capacity, disease monitoring ability, self-positioning awareness, and medication safety).ResultsIn the evaluation module of “Ability in syndrome differentiation and treatment principles,” the performance ranking of the 4 models was: (1) ChatGLM4+ Knowledge Base, (2) ChatGLM4, (3) ChatGLM3-6B, and (4) ChatGPT-4.0. Regarding the assessment of confusion between TCM and Western medicine concepts, ChatGPT-4.0 exhibited conceptual confusion in 32 out of 87 cases, while the ChatGLM series of LLMs showed no such confusion (except for ChatGLM3-6B, which had 1 instance). In the “Comprehensive evaluation of question-answering texts” module (comprising 6 components: ability to integrate Chinese and Western medicine, ability to formulate treatment plans, health management capacity, disease monitoring ability, self-positioning awareness, and medication safety), the ranking was: (1) ChatGLM4+ Knowledge Base, (2) ChatGPT-4.0, (3) ChatGLM4, and (4) ChatGLM3-6B.ConclusionsOur study results demonstrated that real-time internet connectivity played a critical role in LLM-assisted TCM diagnosis and treatment, while offline models showed significantly reduced performance in clinical decision support. Furthermore, pretraining LLMs with TCM-specific knowledge bases while maintaining internet search capabilities substantially enhanced their diagnostic and therapeutic performance in TCM applications. Importantly, general-purpose LLMs required both domain-specific medical fine-tuning and culturally sensitive adaptation to meet the rigorous standards of TCM clinical practice.

  • Research Article
  • 10.1093/ofid/ofae631.2030
P-1869. Utilizing Large Language Models for Enhanced Decision Support in Travel Medicine Clinic: our experience at Mayo Clinic
  • Jan 29, 2025
  • Open Forum Infectious Diseases
  • John C O’Horo + 5 more

Background The integration of Generative AI (GAI) into healthcare systems is increasingly recognized for its potential to transform patient management. The primary aim of this research was to evaluate and quantify the performance of large language models (LLMs) in generating actionable travel medicine advice. Architectural design of the Travel Clinic LLM project. Four phases of Discovery, Design, Evaluation, and Implementation/Deployment. Methods This study utilized two iterative phases of evaluation. In the initial phase, LLMs were prompted with detailed clinical scenarios including demographic data, medical and immunization histories, and specific travel plans. These prompts were designed to mimic typical inquiries encountered in travel consultations. The LLMs' initial responses were generated using the CDC’s Yellow Book as a foundational knowledge base. In the subsequent phase, the prompts were refined for greater specificity and clarity, and the knowledge base was enhanced by transitioning to Travax’s Travelers’ Health database. Additional structured data inputs included an exhaustive list of vaccines from our pharmacy formulary and a detailed table of vaccine contraindications. The responses were evaluated and scored by ID clinicians from the Mayo Clinic. Results Initial findings after first iteration revealed limited efficacy with recall at 23.9%, an F1 score of 38.6%, accuracy also at 23.9%, and precision maintained at 100%, utilizing the CDC's Yellow Book. With the implementation of Travax and refined prompting techniques, preliminary results suggest a notable improvement in the quality of responses, though detailed scoring is presently underway. Improvements in the LLM’s performance can be attributed to several key adjustments: the adoption of a more comprehensive knowledge base, refined prompt engineering, and the incorporation of structured data to support more accurate and detailed recommendations. The collaborative engagement of Mayo Clinic with Google and Travax facilitated a synergistic approach to optimizing the AI model's utility and integration. Future plans include embedding the LLM into our EMR system. Conclusion The findings from this study highlight the significance of strategic collaborations between large healthcare centers, IT industry, and specialized knowledge database firms in effectively harnessing GAI for clinical use. Disclosures All Authors: No reported disclosures

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.