A Hybrid Model for Family History Information Identification and Relation Extraction: Development and Evaluation of an End-to-End Information Extraction System.

Youngjun Kim,Stéphane M Meystre,Isabel Rh Lally,Paul M Heider

doi:10.2196/22797

Abstract

BackgroundFamily history information is important to assess the risk of inherited medical conditions. Natural language processing has the potential to extract this information from unstructured free-text notes to improve patient care and decision making. We describe the end-to-end information extraction system the Medical University of South Carolina team developed when participating in the 2019 National Natural Language Processing Clinical Challenge (n2c2)/Open Health Natural Language Processing (OHNLP) shared task.ObjectiveThis task involves identifying mentions of family members and observations in electronic health record text notes and recognizing the 2 types of relations (family member-living status relations and family member-observation relations). Our system aims to achieve a high level of performance by integrating heuristics and advanced information extraction methods. Our efforts also include improving the performance of 2 subtasks by exploiting additional labeled data and clinical text-based embedding models.MethodsWe present a hybrid method that combines machine learning and rule-based approaches. We implemented an end-to-end system with multiple information extraction and attribute classification components. For entity identification, we trained bidirectional long short-term memory deep learning models. These models incorporated static word embeddings and context-dependent embeddings. We created a voting ensemble that combined the predictions of all individual models. For relation extraction, we trained 2 relation extraction models. The first model determined the living status of each family member. The second model identified observations associated with each family member. We implemented online gradient descent models to extract related entity pairs. As part of postchallenge efforts, we used the BioCreative/OHNLP 2018 corpus and trained new models with the union of these 2 datasets. We also pretrained language models using clinical notes from the Medical Information Mart for Intensive Care (MIMIC-III) clinical database.ResultsThe voting ensemble achieved better performance than individual classifiers. In the entity identification task, our top-performing system reached a precision of 78.90% and a recall of 83.84%. Our natural language processing system for entity identification took 3rd place out of 17 teams in the challenge. We ranked 4th out of 9 teams in the relation extraction task. Our system substantially benefited from the combination of the 2 datasets. Compared to our official submission with F1 scores of 81.30% and 64.94% for entity identification and relation extraction, respectively, the revised system yielded significantly better performance (P<.05) with F1 scores of 86.02% and 72.48%, respectively.ConclusionsWe demonstrated that a hybrid model could be used to successfully extract family history information recorded in unstructured free-text notes. In this study, our approach to entity identification as a sequence labeling problem produced satisfactory results. Our postchallenge efforts significantly improved performance by leveraging additional labeled data and using word vector representations learned from large collections of clinical notes.

Highlights

History (FH) information included in the electronic health record (EHR) is important to assess the risk of inherited medical conditions
We demonstrated that a hybrid model could be used to successfully extract family history information recorded in unstructured free-text notes
This manuscript describes the end-to-end information extraction (IE) system the Medical University of South Carolina (MUSC) team developed when participating in the 2019 National Natural Language Processing Clinical Challenge (n2c2)/Open Health Natural Language Processing (OHNLP) track on Family history (FH) extraction [6]

Summary

Introduction

History (FH) information included in the electronic health record (EHR) is important to assess the risk of inherited medical conditions. Natural language processing (NLP) has the potential to extract this information from unstructured free-text notes to improve patient care and decision making This manuscript describes the end-to-end information extraction (IE) system the Medical University of South Carolina (MUSC) team developed when participating in the 2019 National Natural Language Processing Clinical Challenge (n2c2)/Open Health Natural Language Processing (OHNLP) track on FH extraction [6]. This shared task is built on the BioCreative/OHNLP 2018 FH extraction task [7]. We describe the end-to-end information extraction system the Medical University of South Carolina team developed when participating in the 2019 National Natural Language Processing Clinical Challenge (n2c2)/Open Health Natural Language Processing (OHNLP) shared task

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: JMIR medical informatics	Publication Date: Apr 22, 2021
Citations: 8	License type: cc-by

R Discovery Prime

R Discovery Prime

A Hybrid Model for Family History Information Identification and Relation Extraction: Development and Evaluation of an End-to-End Information Extraction System.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: JMIR medical informatics

Lead the way for us

Similar Papers

Acquisition of a Lexicon for Family History Information: Bidirectional Encoder Representations From Transformers-Assisted Sublanguage Analysis.
Liwei Wang ... Andrew Wen
JMIR medical informatics | VOL. 11
Liwei Wang, et. al.Liwei Wang ... Andrew Wen
27 Jun 2023
JMIR medical informatics | VOL. 11

Towards deep understanding of graph convolutional networks for relation extraction
Tao Wu ... Chao Wang
Data & Knowledge Engineering | VOL. 149
Tao Wu, et. al.Tao Wu ... Chao Wang
07 Dec 2023
Data & Knowledge Engineering | VOL. 149

REACT: Relation Extraction Method Based on Entity Attention Network and Cascade Binary Tagging Framework
Lingqi Kong ... Shengquau Liu
Applied Sciences | VOL. 14
Lingqi Kong, et. al.Lingqi Kong ... Shengquau Liu
02 Apr 2024
Applied Sciences | VOL. 14

A Relation Extraction Model Based on BERT Model in the Financial Regulation Field
Xiaoguo Wang ... Yanning Sun
-
Xiaoguo Wang, et. al.Xiaoguo Wang ... Yanning Sun
23 Sep 2022
23 Sep 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Hybrid Model for Family History Information Identification and Relation Extraction: Development and Evaluation of an End-to-End Information Extraction System.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: JMIR medical informatics