Automatically disambiguating medical acronyms with ontology-aware deep learning

Marta Skreta,Devin Singh,Michael Brudno,Jacob Kelly,Erik Drysdale,Aryan Arbabi,Jixuan Wang

doi:10.1038/s41467-021-25578-4

Abstract

Modern machine learning (ML) technologies have great promise for automating diverse clinical and research workflows; however, training them requires extensive hand-labelled datasets. Disambiguating abbreviations is important for automated clinical note processing; however, broad deployment of ML for this task is restricted by the scarcity and imbalance of labeled training data. In this work we present a method that improves a model’s ability to generalize through novel data augmentation techniques that utilizes information from biomedical ontologies in the form of related medical concepts, as well as global context information within the medical note. We train our model on a public dataset (MIMIC III) and test its performance on automatically generated and hand-labelled datasets from different sources (MIMIC III, CASI, i2b2). Together, these techniques boost the accuracy of abbreviation disambiguation by up to 17% on hand-labeled data, without sacrificing performance on a held-out test set from MIMIC III.

Highlights

Modern machine learning (ML) technologies have great promise for automating diverse clinical and research workflows; training them requires extensive hand-labelled datasets
A number of supervised machine learning (ML) models have been built for abbreviation disambiguation in medical notes, including ones based on support vector machines (SVM), Naive Bayes classifiers, and neural networks[3,4,5,6]
We map medical concepts in unified medical language system (UMLS) to the resulting vector space to generate a word embedding for every medical concept

Summary

Introduction

Modern machine learning (ML) technologies have great promise for automating diverse clinical and research workflows; training them requires extensive hand-labelled datasets. While disambiguating abbreviations is typically simple for an expert in the field, it is a challenging task for automated processing, which has been addressed by a number of methods going back at least 20 years These methods largely rely on supervised algorithms such as Naive Bayes classifiers trained on co-occurrence counts of senses with automatically tagged medical concepts in biomedical abstracts[1]. Creating handlabeled medical abbreviation datasets to train and test ML models is costly and difficult, and to the best of our knowledge, the only such publicly available dataset with training data and labels is the Clinical Abbreviation Sense Inventory (CASI)[9], which contains just 75 abbreviations The sparsity of these datasets makes methods built based on them vulnerable to overfitting and inapplicable to abbreviations not present in the training data. This was motivated by the idea that acronym expansions are related to the topic of the abstract and that topics can be described by words with the highest TF-IDF weights

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Nature Communications	Publication Date: Sep 7, 2021
Citations: 13	License type: open-access

R Discovery Prime

R Discovery Prime

Automatically disambiguating medical acronyms with ontology-aware deep learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Nature Communications

Lead the way for us

Similar Papers

GC–HGNN: A global-context supported hypergraph neural network for enhancing session-based recommendation
Dunlu Peng ... Shuo Zhang
Electronic Commerce Research and Applications | VOL. 52
Dunlu Peng, et. al.Dunlu Peng ... Shuo Zhang
01 Mar 2022
Electronic Commerce Research and Applications | VOL. 52

Advances in machine learning technology for sustainable biofuel production systems in lignocellulosic biorefineries
Vishal Sharma ... Cheng-Di Dong
Science of The Total Environment | VOL. 886
Vishal Sharma, et. al.Vishal Sharma ... Cheng-Di Dong
08 May 2023
Science of The Total Environment | VOL. 886

Accelerated innovation in developing high-performance metal halide perovskite solar cell using machine learning
Anjan Kumar ... Sangeeta Singh
International Journal of Modern Physics B | VOL. 37
Anjan Kumar, et. al.Anjan Kumar ... Sangeeta Singh
30 Sep 2022
International Journal of Modern Physics B | VOL. 37

Machine learning for condensed matter physics
Edwin Bedolla ... Ramón Castañeda-Priego
Journal of Physics: Condensed Matter | VOL. 33
Edwin Bedolla, et. al.Edwin Bedolla ... Ramón Castañeda-Priego
05 Nov 2020
Journal of Physics: Condensed Matter | VOL. 33

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatically disambiguating medical acronyms with ontology-aware deep learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Nature Communications