Abstract

Significance. Early detection of axial spondyloarthritis (axSpA) is a complex clinical task. Quality improvement of axSpA diagnostics in primary care settings is possible with the help of decision-making information systems based on the ontological approach application. The key stage of the decision-making system development consists of the elaboration of a set of clinical terms. This set should fully describe clinical area or sub-area under study. One of the essential requirements is compliance of the clinical terms used with the existing clinical nomenclatures. Currently, the largest set of clinical terms is the Unified Medical Language System (UMLS) metathesaurus. The majority of UMLS terms is presented in English only. The development of tools for the analysis of unstructured texts and recognition of clinically relevant UMLS entities make it possible to elaborate a set of terms describing axSpA diagnostic aspects. This will also help to compile a list of UMLS terminology nomenclatures for their priority adaptation and expert translation into Russian. The purpose of this study is to develop an automated system for recognizing clinically relevant UMLS terms in texts of the English-language articles. Material and methods. The research material has included English terms (11.2 million) aggregated from 76 nome6nclatures of the current UMLS (2022AB) version. In addition, the study has used texts of PubMed clinical abstracts in English. Queries to the UMLS graph model, semantic algorithms for unstructured texts and machine-learning methods have been applied for data collection and analysis. Results. The study has elaborated a set of high-accuracy regular expressions (F1-score=98%) for metadata elimination from the text corpus. Then the authors have identified patterns for searching clinically relevant terms in the aggregated set of UMLS concepts. Using a logistic regression algorithm, the authors have trained a binary classification model. Input data for the created classificatory are information about an UMLS term. Output data are a label indicating the presence or absence of clinical relevance. Conclusion. The binary classification model has been validated individually and double-tested on different data samples. Values of the accuracy, sensitivity and specificity of metrics equal to 91%, 90% and 91%, respectively, for the validation sample (a number of axSpA terms). In addition, this model has been tested on the sets of terms aggregated for any two diseases. Values of the accuracy metric equal to 91% and 90%, respectively. With the help of the developed machine-learning model, the study has estimated that UMLS contains 1.5 million unique terms applicable to describing a clinical picture. In addition, lists of priority UMLS data sources and thematic groups have been compiled. These clinically relevant UMLS terms should be adapted and translated into Russian as soon as possible.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call