Unified Medical Language System Metathesaurus Research Articles

Significance. Early detection of axial spondyloarthritis (axSpA) is a complex clinical task. Quality improvement of axSpA diagnostics in primary care settings is possible with the help of decision-making information systems based on the ontological approach application. The key stage of the decision-making system development consists of the elaboration of a set of clinical terms. This set should fully describe clinical area or sub-area under study. One of the essential requirements is compliance of the clinical terms used with the existing clinical nomenclatures. Currently, the largest set of clinical terms is the Unified Medical Language System (UMLS) metathesaurus. The majority of UMLS terms is presented in English only. The development of tools for the analysis of unstructured texts and recognition of clinically relevant UMLS entities make it possible to elaborate a set of terms describing axSpA diagnostic aspects. This will also help to compile a list of UMLS terminology nomenclatures for their priority adaptation and expert translation into Russian. The purpose of this study is to develop an automated system for recognizing clinically relevant UMLS terms in texts of the English-language articles. Material and methods. The research material has included English terms (11.2 million) aggregated from 76 nome6nclatures of the current UMLS (2022AB) version. In addition, the study has used texts of PubMed clinical abstracts in English. Queries to the UMLS graph model, semantic algorithms for unstructured texts and machine-learning methods have been applied for data collection and analysis. Results. The study has elaborated a set of high-accuracy regular expressions (F1-score=98%) for metadata elimination from the text corpus. Then the authors have identified patterns for searching clinically relevant terms in the aggregated set of UMLS concepts. Using a logistic regression algorithm, the authors have trained a binary classification model. Input data for the created classificatory are information about an UMLS term. Output data are a label indicating the presence or absence of clinical relevance. Conclusion. The binary classification model has been validated individually and double-tested on different data samples. Values of the accuracy, sensitivity and specificity of metrics equal to 91%, 90% and 91%, respectively, for the validation sample (a number of axSpA terms). In addition, this model has been tested on the sets of terms aggregated for any two diseases. Values of the accuracy metric equal to 91% and 90%, respectively. With the help of the developed machine-learning model, the study has estimated that UMLS contains 1.5 million unique terms applicable to describing a clinical picture. In addition, lists of priority UMLS data sources and thematic groups have been compiled. These clinically relevant UMLS terms should be adapted and translated into Russian as soon as possible.

Read full abstract

BackgroundHow to treat a disease remains to be the most common type of clinical question. Obtaining evidence-based answers from biomedical literature is difficult. Analogical reasoning with embeddings from deep learning (embedding analogies) may extract such biomedical facts, although the state-of-the-art focuses on pair-based proportional (pairwise) analogies such as man:woman::king:queen (“queen = −man +king +woman”).ObjectiveThis study aimed to systematically extract disease treatment statements with a Semantic Deep Learning (SemDeep) approach underpinned by prior knowledge and another type of 4-term analogy (other than pairwise).MethodsAs preliminaries, we investigated Continuous Bag-of-Words (CBOW) embedding analogies in a common-English corpus with five lines of text and observed a type of 4-term analogy (not pairwise) applying the 3CosAdd formula and relating the semantic fields person and death: “dagger = −Romeo +die +died” (search query: −Romeo +die +died). Our SemDeep approach worked with pre-existing items of knowledge (what is known) to make inferences sanctioned by a 4-term analogy (search query −x +z1 +z2) from CBOW and Skip-gram embeddings created with a PubMed systematic reviews subset (PMSB dataset). Stage1: Knowledge acquisition. Obtaining a set of terms, candidate y, from embeddings using vector arithmetic. Some n-gram pairs from the cosine and validated with evidence (prior knowledge) are the input for the 3cosAdd, seeking a type of 4-term analogy relating the semantic fields disease and treatment. Stage 2: Knowledge organization. Identification of candidates sanctioned by the analogy belonging to the semantic field treatment and mapping these candidates to unified medical language system Metathesaurus concepts with MetaMap. A concept pair is a brief disease treatment statement (biomedical fact). Stage 3: Knowledge validation. An evidence-based evaluation followed by human validation of biomedical facts potentially useful for clinicians.ResultsWe obtained 5352 n-gram pairs from 446 search queries by applying the 3CosAdd. The microaveraging performance of MetaMap for candidate y belonging to the semantic field treatment was F-measure=80.00% (precision=77.00%, recall=83.25%). We developed an empirical heuristic with some predictive power for clinical winners, that is, search queries bringing candidate y with evidence of a therapeutic intent for target disease x. The search queries -asthma +inhaled_corticosteroids +inhaled_corticosteroid and -epilepsy +valproate +antiepileptic_drug were clinical winners, finding eight evidence-based beneficial treatments.ConclusionsExtracting treatments with therapeutic intent by analogical reasoning from embeddings (423K n-grams from the PMSB dataset) is an ambitious goal. Our SemDeep approach is knowledge-based, underpinned by embedding analogies that exploit prior knowledge. Biomedical facts from embedding analogies (4-term type, not pairwise) are potentially useful for clinicians. The heuristic offers a practical way to discover beneficial treatments for well-known diseases. Learning from deep learning models does not require a massive amount of data. Embedding analogies are not limited to pairwise analogies; hence, analogical reasoning with embeddings is underexploited.

Read full abstract

Unified Medical Language System Metathesaurus Research Articles

Related Topics

Articles published on Unified Medical Language System Metathesaurus

A Large Language Model to Detect Negated Expressions in Radiology Reports.

Toward Reliable Symptom Coding in Electronic Health Records for Symptom Assessment and Research: Identification and Categorization of International Classification of Diseases, Ninth Revision, Clinical Modification Symptom Codes.

Retrieval-Based Diagnostic Decision Support: Mixed Methods Study.

Enhancing Medical Image Retrieval with UMLS-Integrated CNN-Based Text Indexing.

АВТОМАТИЗИРОВАННАЯ СИСТЕМА ИЗВЛЕЧЕНИЯ КЛИНИЧЕСКИ РЕЛЕВАНТНЫХ ТЕРМИНОВ UMLS ИЗ ТЕКСТОВ АНГЛОЯЗЫЧНЫХ СТАТЕЙ НА ПРИМЕРЕ АКСИАЛЬНОГО СПОНДИЛОАРТРИТА

Performance assessment of ontology matching systems for FAIR data

Context-Enriched Learning Models for Aligning Biomedical Vocabularies at Scale in the UMLS Metathesaurus.

High Throughput Neurological Phenotyping with MetaMap

Evaluating Biomedical Word Embeddings for Vocabulary Alignment at Scale in the UMLS Metathesaurus Using Siamese Networks.

Customizable Natural Language Processing Biomarker Extraction Tool.

Using the Unified Medical Language System to Expand the Operative Stress Score – First Use Case

PhenClust, a standalone tool for identifying trends within sets of biological phenotypes using semantic similarity and the Unified Medical Language System metathesaurus.

Improving early diagnosis of rare diseases using Natural Language Processing in unstructured medical records: an illustration from Dravet syndrome

Clinical Term Normalization Using Learned Edit Patterns and Subconcept Matching: System Development and Evaluation.

The impact of learning Unified Medical Language System knowledge embeddings in relation extraction from biomedical texts

Semantic Deep Learning: Prior Knowledge and a Type of Four-Term Embedding Analogy to Acquire Treatments for Well-Known Diseases.

Ontological and Non-Ontological Resources for Associating Medical Dictionary for Regulatory Activities Terms to SNOMED Clinical Terms With Semantic Properties.

Construction of Disease Similarity Networks Using Concept Embedding and Ontology.

UMLS mapping and Word embeddings for ICD code assignment using the MIMIC-III intensive care database.

Ontological representation-oriented term normalization and standardization of the Research Domain Criteria.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Unified Medical Language System Metathesaurus Research Articles

Related Topics

Articles published on Unified Medical Language System Metathesaurus

A Large Language Model to Detect Negated Expressions in Radiology Reports.

Toward Reliable Symptom Coding in Electronic Health Records for Symptom Assessment and Research: Identification and Categorization of International Classification of Diseases, Ninth Revision, Clinical Modification Symptom Codes.

Retrieval-Based Diagnostic Decision Support: Mixed Methods Study.

Enhancing Medical Image Retrieval with UMLS-Integrated CNN-Based Text Indexing.

АВТОМАТИЗИРОВАННАЯ СИСТЕМА ИЗВЛЕЧЕНИЯ КЛИНИЧЕСКИ РЕЛЕВАНТНЫХ ТЕРМИНОВ UMLS ИЗ ТЕКСТОВ АНГЛОЯЗЫЧНЫХ СТАТЕЙ НА ПРИМЕРЕ АКСИАЛЬНОГО СПОНДИЛОАРТРИТА

Performance assessment of ontology matching systems for FAIR data

Context-Enriched Learning Models for Aligning Biomedical Vocabularies at Scale in the UMLS Metathesaurus.

High Throughput Neurological Phenotyping with MetaMap

Evaluating Biomedical Word Embeddings for Vocabulary Alignment at Scale in the UMLS Metathesaurus Using Siamese Networks.

Customizable Natural Language Processing Biomarker Extraction Tool.

Using the Unified Medical Language System to Expand the Operative Stress Score – First Use Case

PhenClust, a standalone tool for identifying trends within sets of biological phenotypes using semantic similarity and the Unified Medical Language System metathesaurus.

Improving early diagnosis of rare diseases using Natural Language Processing in unstructured medical records: an illustration from Dravet syndrome

Clinical Term Normalization Using Learned Edit Patterns and Subconcept Matching: System Development and Evaluation.

The impact of learning Unified Medical Language System knowledge embeddings in relation extraction from biomedical texts

Semantic Deep Learning: Prior Knowledge and a Type of Four-Term Embedding Analogy to Acquire Treatments for Well-Known Diseases.

Ontological and Non-Ontological Resources for Associating Medical Dictionary for Regulatory Activities Terms to SNOMED Clinical Terms With Semantic Properties.

Construction of Disease Similarity Networks Using Concept Embedding and Ontology.

UMLS mapping and Word embeddings for ICD code assignment using the MIMIC-III intensive care database.

Ontological representation-oriented term normalization and standardization of the Research Domain Criteria.