MARIE: A Context-Aware Term Mapping with String Matching and Embedding Vectors

Han Kyul Kim,Jiin Choi,Ye Seul Bae,Sae Won Choi,Hyein Kwon,Christine P Lee,Taehoon Ko,Hae-Young Lee

doi:10.3390/app10217831

Abstract

With growing interest in machine learning, text standardization is becoming an increasingly important aspect of data pre-processing within biomedical communities. As performances of machine learning algorithms are affected by both the amount and the quality of their training data, effective data standardization is needed to guarantee consistent data integrity. Furthermore, biomedical organizations, depending on their geographical locations or affiliations, rely on different sets of text standardization in practice. To facilitate easier machine learning-related collaborations between these organizations, an effective yet practical text data standardization method is needed. In this paper, we introduce MARIE (a context-aware term mapping method with string matching and embedding vectors), an unsupervised learning-based tool, to find standardized clinical terminologies for queries, such as a hospital’s own codes. By incorporating both string matching methods and term embedding vectors generated by BioBERT (bidirectional encoder representations from transformers for biomedical text mining), it utilizes both structural and contextual information to calculate similarity measures between source and target terms. Compared to previous term mapping methods, MARIE shows improved mapping accuracy. Furthermore, it can be easily expanded to incorporate any string matching or term embedding methods. Without requiring any additional model training, it is not only effective, but also a practical term mapping method for text data standardization and pre-processing.

Highlights

Due to the growing interest in text mining and natural language processing (NLP) in the biomedical field [1,2,3], data pre-processing is becoming an increasingly crucial issue for many biomedical practitioners
Among various code standards available in the CDM, we limited our mapping to SNOMED CT, Logical Observation Identifiers Names and Codes (LOINC), RxNorm and RxNorm Extension [39,40]
We reported the mapping accuracy of the string matching methods and an embedding vector-based mapping method

Summary

Introduction

Due to the growing interest in text mining and natural language processing (NLP) in the biomedical field [1,2,3], data pre-processing is becoming an increasingly crucial issue for many biomedical practitioners. As datasets used in research and practice are often different, it is challenging to directly apply recent advancements in biomedical NLP to existing IT systems without effective data pre-processing. One of the key issues addressed during pre-processing is concept normalization, which refers to the task of aligning different text datasets or corpora into a common standard.

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Nov 4, 2020
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

MARIE: A Context-Aware Term Mapping with String Matching and Embedding Vectors

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

A GPU-based Bit-Parallel Multiple Pattern Matching Algorithm
Che-Lun Hung ... Chun-Yuan Lin
-
Che-Lun Hung, et. al.Che-Lun Hung ... Chun-Yuan Lin
01 Jun 2018
01 Jun 2018

GPU-based Multi-stream Analyzer on Application Layer for Service-Oriented Router
Kazumasa Ikeuchi ... Shinichi Ishida
-
Kazumasa Ikeuchi, et. al.Kazumasa Ikeuchi ... Shinichi Ishida
01 Sep 2013
01 Sep 2013

Manual semantic tagging to improve access to information in narrative electronic medical records
Gustav Mikkelsen ... Jan Aasly
International Journal of Medical Informatics | VOL. 65
Gustav Mikkelsen, et. al.Gustav Mikkelsen ... Jan Aasly
12 Mar 2002
International Journal of Medical Informatics | VOL. 65

Extracting correspondences between terminologies for an easier access to biomedical information
Adila Merabti ... Stéfan J Darmoni
EMBnet.journal | VOL. 18
Adila Merabti, et. al.Adila Merabti ... Stéfan J Darmoni
09 Nov 2012
EMBnet.journal | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MARIE: A Context-Aware Term Mapping with String Matching and Embedding Vectors

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences