An Automatic and End-to-End System for Rare Disease Knowledge Graph Construction Based on Ontology-Enhanced Large Language Models: Development Study.

Lang Cao,Adam Cross,Jimeng Sun

doi:10.2196/60665

Abstract

Rare diseases affect millions worldwide but sometimes face limited research focus individually due to low prevalence. Many rare diseases do not have specific International Classification of Diseases, Ninth Edition (ICD-9) and Tenth Edition (ICD-10), codes and therefore cannot be reliably extracted from granular fields like "Diagnosis" and "Problem List" entries, which complicates tasks that require identification of patients with these conditions, including clinical trial recruitment and research efforts. Recent advancements in large language models (LLMs) have shown promise in automating the extraction of medical information, offering the potential to improve medical research, diagnosis, and management. However, most LLMs lack professional medical knowledge, especially concerning specific rare diseases, and cannot effectively manage rare disease data in its various ontological forms, making it unsuitable for these tasks. Our aim is to create an end-to-end system called automated rare disease mining (AutoRD), which automates the extraction of rare disease-related information from medical text, focusing on entities and their relations to other medical concepts, such as signs and symptoms. AutoRD integrates up-to-date ontologies with other structured knowledge and demonstrates superior performance in rare disease extraction tasks. We conducted various experiments to evaluate AutoRD's performance, aiming to surpass common LLMs and traditional methods. AutoRD is a pipeline system that involves data preprocessing, entity extraction, relation extraction, entity calibration, and knowledge graph construction. We implemented this system using GPT-4 and medical knowledge graphs developed from the open-source Human Phenotype and Orphanet ontologies, using techniques such as chain-of-thought reasoning and prompt engineering. We quantitatively evaluated our system's performance in entity extraction, relation extraction, and knowledge graph construction. The experiment used the well-curated dataset RareDis2023, which contains medical literature focused on rare disease entities and their relations, making it an ideal dataset for training and testing our methodology. On the RareDis2023 dataset, AutoRD achieved an overall entity extraction F1-score of 56.1% and a relation extraction F1-score of 38.6%, marking a 14.4% improvement over the baseline LLM. Notably, the F1-score for rare disease entity extraction reached 83.5%, indicating high precision and recall in identifying rare disease mentions. These results demonstrate the effectiveness of integrating LLMs with medical ontologies in extracting complex rare disease information. AutoRD is an automated end-to-end system for extracting rare disease information from text to build knowledge graphs, addressing critical limitations of existing LLMs by improving identification of these diseases and connecting them to related clinical features. This work underscores the significant potential of LLMs in transforming health care, particularly in the rare disease domain. By leveraging ontology-enhanced LLMs, AutoRD constructs a robust medical knowledge base that incorporates up-to-date rare disease information, facilitating improved identification of patients and resulting in more inclusive research and trial candidacy efforts.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: JMIR medical informatics	Publication Date: Dec 18, 2024
Citations: 1	License type: cc-by

R Discovery Prime

An Automatic and End-to-End System for Rare Disease Knowledge Graph Construction Based on Ontology-Enhanced Large Language Models: Development Study.

Abstract

Published Version

Talk to us

Similar Papers

More From: JMIR medical informatics

Lead the way for us

Similar Papers

Utilizing Large Language Models for Geoscience Literature Information Extraction
Peng Yu ... Ying Wen
-
Peng Yu, et. al.Peng Yu ... Ying Wen
09 Mar 2024
09 Mar 2024

BiodiViz: Leveraging NER and RE for Automated Knowledge Graph Generation in Biodiversity Research
Angela Shannen Tan ... Roselyn Gabud
Biodiversity Information Science and Standards | VOL. 8
Angela Shannen Tan, et. al.Angela Shannen Tan ... Roselyn Gabud
29 Oct 2024
Biodiversity Information Science and Standards | VOL. 8

Knowledge graph construction for heart failure using large language models with prompt engineering.
Tianhan Xu ... Xiang Gu
Frontiers in computational neuroscience | VOL. 18
Tianhan Xu, et. al.Tianhan Xu ... Xiang Gu
02 Jul 2024
Frontiers in computational neuroscience | VOL. 18

Detecting contradictions from IoT protocol specification documents based on neural generated knowledge graph
Xinguo Feng ... Guangdong Bai
ISA Transactions | VOL. 141
Xinguo Feng, et. al.Xinguo Feng ... Guangdong Bai
29 Apr 2023
ISA Transactions | VOL. 141

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

An Automatic and End-to-End System for Rare Disease Knowledge Graph Construction Based on Ontology-Enhanced Large Language Models: Development Study.

Abstract

Published Version

Talk to us

Similar Papers

More From: JMIR medical informatics