Unified Medical Language System Concepts Research Articles

Backgrounding cerebrovascular disease (CeVD) from inpatient electronic medical records (EMRs) through natural language processing (NLP) is pivotal for automated disease surveillance and improving patient outcomes. Existing methods rely on coders’ abstraction, which has time delays and under-coding issues. This study sought to develop an NLP-based method to detect CeVD using EMR clinical notes.MethodsCeVD status was confirmed through a chart review on randomly selected hospitalized patients who were 18 years or older and discharged from 3 hospitals in Calgary, Alberta, Canada, between January 1 and June 30, 2015. These patients’ chart data were linked to administrative discharge abstract database (DAD) and Sunrise™ Clinical Manager (SCM) EMR database records by Personal Health Number (a unique lifetime identifier) and admission date. We trained multiple natural language processing (NLP) predictive models by combining two clinical concept extraction methods and two supervised machine learning (ML) methods: random forest and XGBoost. Using chart review as the reference standard, we compared the model performances with those of the commonly applied International Classification of Diseases (ICD-10-CA) codes, on the metrics of sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV).ResultOf the study sample (n = 3036), the prevalence of CeVD was 11.8% (n = 360); the median patient age was 63; and females accounted for 50.3% (n = 1528) based on chart data. Among 49 extracted clinical documents from the EMR, four document types were identified as the most influential text sources for identifying CeVD disease (“nursing transfer report,” “discharge summary,” “nursing notes,” and “inpatient consultation.”). The best performing NLP model was XGBoost, combining the Unified Medical Language System concepts extracted by cTAKES (e.g., top-ranked concepts, “Cerebrovascular accident” and “Transient ischemic attack”), and the term frequency-inverse document frequency vectorizer. Compared with ICD codes, the model achieved higher validity overall, such as sensitivity (25.0% vs 70.0%), specificity (99.3% vs 99.1%), PPV (82.6 vs. 87.8%), and NPV (90.8% vs 97.1%).ConclusionThe NLP algorithm developed in this study performed better than the ICD code algorithm in detecting CeVD. The NLP models could result in an automated EMR tool for identifying CeVD cases and be applied for future studies such as surveillance, and longitudinal studies.

Read full abstract

Significance. Early detection of axial spondyloarthritis (axSpA) is a complex clinical task. Quality improvement of axSpA diagnostics in primary care settings is possible with the help of decision-making information systems based on the ontological approach application. The key stage of the decision-making system development consists of the elaboration of a set of clinical terms. This set should fully describe clinical area or sub-area under study. One of the essential requirements is compliance of the clinical terms used with the existing clinical nomenclatures. Currently, the largest set of clinical terms is the Unified Medical Language System (UMLS) metathesaurus. The majority of UMLS terms is presented in English only. The development of tools for the analysis of unstructured texts and recognition of clinically relevant UMLS entities make it possible to elaborate a set of terms describing axSpA diagnostic aspects. This will also help to compile a list of UMLS terminology nomenclatures for their priority adaptation and expert translation into Russian. The purpose of this study is to develop an automated system for recognizing clinically relevant UMLS terms in texts of the English-language articles. Material and methods. The research material has included English terms (11.2 million) aggregated from 76 nome6nclatures of the current UMLS (2022AB) version. In addition, the study has used texts of PubMed clinical abstracts in English. Queries to the UMLS graph model, semantic algorithms for unstructured texts and machine-learning methods have been applied for data collection and analysis. Results. The study has elaborated a set of high-accuracy regular expressions (F1-score=98%) for metadata elimination from the text corpus. Then the authors have identified patterns for searching clinically relevant terms in the aggregated set of UMLS concepts. Using a logistic regression algorithm, the authors have trained a binary classification model. Input data for the created classificatory are information about an UMLS term. Output data are a label indicating the presence or absence of clinical relevance. Conclusion. The binary classification model has been validated individually and double-tested on different data samples. Values of the accuracy, sensitivity and specificity of metrics equal to 91%, 90% and 91%, respectively, for the validation sample (a number of axSpA terms). In addition, this model has been tested on the sets of terms aggregated for any two diseases. Values of the accuracy metric equal to 91% and 90%, respectively. With the help of the developed machine-learning model, the study has estimated that UMLS contains 1.5 million unique terms applicable to describing a clinical picture. In addition, lists of priority UMLS data sources and thematic groups have been compiled. These clinically relevant UMLS terms should be adapted and translated into Russian as soon as possible.

Read full abstract

Unified Medical Language System Concepts Research Articles

Related Topics

Articles published on Unified Medical Language System Concepts

Streamlining social media information retrieval for public health research with deep learning.

Representation of child and youth participation within the Unified Medical Language System (UMLS)

Complementary and Integrative Health Information in the literature: its lexicon and named entity recognition.

Abstract 17253: Concepts Associated With 30-day Rehospitalization From Unstructured Data Among Patients With Heart Failure

Cerebrovascular disease case identification in inpatient electronic medical record data using natural language processing

The suitability of UMLS and SNOMED-CT for encoding outcome concepts.

LeafAI: query generator for clinical cohort discovery rivaling a human programmer.

Ontology-driven and weakly supervised rare disease identification from clinical notes

АВТОМАТИЗИРОВАННАЯ СИСТЕМА ИЗВЛЕЧЕНИЯ КЛИНИЧЕСКИ РЕЛЕВАНТНЫХ ТЕРМИНОВ UMLS ИЗ ТЕКСТОВ АНГЛОЯЗЫЧНЫХ СТАТЕЙ НА ПРИМЕРЕ АКСИАЛЬНОГО СПОНДИЛОАРТРИТА

Mapping Chinese Medical Entities to the Unified Medical Language System.

A practical approach to identifying autistic adults within the electronic health record.

Analysis of the Representation of Frequent Clinical Attributes in the Unified Medical Language System.

A configurable software platform for creating, reviewing and adjudicating annotation of unstructured text.

ELaPro, a LOINC-mapped core dataset for top laboratory procedures of eligibility screening for clinical trials

Abstract WP74: An Automated, Electronic Health Record-based Algorithm To Classify Ischemic Stroke Etiology

Text Mining of Adverse Events in Clinical Trials: Deep Learning Approach.

Rare Disease Identification from Clinical Notes with Ontologies and Weak Supervision.

A Silver Standard Biomedical Corpus for Arabic Language

Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts.

Assessing the enrichment of dietary supplement coverage in the Unified Medical Language System.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Unified Medical Language System Concepts Research Articles

Related Topics

Articles published on Unified Medical Language System Concepts

Streamlining social media information retrieval for public health research with deep learning.

Representation of child and youth participation within the Unified Medical Language System (UMLS)

Complementary and Integrative Health Information in the literature: its lexicon and named entity recognition.

Abstract 17253: Concepts Associated With 30-day Rehospitalization From Unstructured Data Among Patients With Heart Failure

Cerebrovascular disease case identification in inpatient electronic medical record data using natural language processing

The suitability of UMLS and SNOMED-CT for encoding outcome concepts.

LeafAI: query generator for clinical cohort discovery rivaling a human programmer.

Ontology-driven and weakly supervised rare disease identification from clinical notes

АВТОМАТИЗИРОВАННАЯ СИСТЕМА ИЗВЛЕЧЕНИЯ КЛИНИЧЕСКИ РЕЛЕВАНТНЫХ ТЕРМИНОВ UMLS ИЗ ТЕКСТОВ АНГЛОЯЗЫЧНЫХ СТАТЕЙ НА ПРИМЕРЕ АКСИАЛЬНОГО СПОНДИЛОАРТРИТА

Mapping Chinese Medical Entities to the Unified Medical Language System.

A practical approach to identifying autistic adults within the electronic health record.

Analysis of the Representation of Frequent Clinical Attributes in the Unified Medical Language System.

A configurable software platform for creating, reviewing and adjudicating annotation of unstructured text.

ELaPro, a LOINC-mapped core dataset for top laboratory procedures of eligibility screening for clinical trials

Abstract WP74: An Automated, Electronic Health Record-based Algorithm To Classify Ischemic Stroke Etiology

Text Mining of Adverse Events in Clinical Trials: Deep Learning Approach.

Rare Disease Identification from Clinical Notes with Ontologies and Weak Supervision.

A Silver Standard Biomedical Corpus for Arabic Language

Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts.

Assessing the enrichment of dietary supplement coverage in the Unified Medical Language System.