Retrofitting Word Embeddings with the UMLS Metathesaurus for Clinical Information Extraction

Mohammed Alawad,Georgia Tourassi,J Blair Christian,S.M Shamimul Hasan

doi:10.1109/bigdata.2018.8621999

Abstract

Deep learning has surged in popularity and proven to be effective for various artificial intelligence applications including information extraction from cancer pathology reports. Since word representation is a core unit that enables deep learning algorithms to understand words and be able to perform NLP, this representation must include as much information as possible to help these algorithms achieve high classification performance. Therefore, in this work in addition to the distributional information of words in large sized corpora, we use UMLS vocabulary resources to enrich the vector space representation of words with the semantic relations between words. These resources provide many terminologies pertaining to cancer. The refined word embeddings are used with a convolutional neural (CNN) model to extract four data elements from cancer pathology reports; ICD-O-3 tumor topography codes, tumor laterality, behavior, and histological grade. We observed that using UMLS vocabulary resources to enrich word embeddings of CNN models consistently outperformed CNN models without pre-training word embeddings and even with pre-trained word embeddings on a domain specific corpus across all four tasks. The results show marginal improvement on the laterality task, but a significant improvement on the other tasks, especially for the macro-f score. Specifically, the improvements are 3%, 13%, and 15% for tumor site, histological grade, and behavior tasks, respectively. This approach is encouraging to enrich word embeddings with more clinical data resources to be used for information abstraction tasks from clinical pathology reports.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Retrofitting Word Embeddings with the UMLS Metathesaurus for Clinical Information Extraction

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Automatic text summarization of konkani texts using pre-trained word embeddings and deep learning
Jovi D’Silva ... Uzzal Sharma
International Journal of Electrical and Computer Engineering (IJECE) | VOL. 12
Jovi D’Silva, et. al.Jovi D’Silva ... Uzzal Sharma
01 Apr 2022
International Journal of Electrical and Computer Engineering (IJECE) | VOL. 12

A study on learning representations for relations between words

-

02 Jun 2020
02 Jun 2020

Dictionary-based Debiasing of Pre-trained Word Embeddings
Masahiro Kaneko ... Danushka Bollegala
-
Masahiro Kaneko, et. al.Masahiro Kaneko ... Danushka Bollegala
01 Jan 2020
01 Jan 2020

Contextualised Word Embeddings Based on Transfer Learning to Dialogue Response Generation: a Proposal and Comparisons
Thomaz Calasans ... Eduardo Raul Hruschka
-
Thomaz Calasans, et. al.Thomaz Calasans ... Eduardo Raul Hruschka
19 Feb 2021
19 Feb 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Retrofitting Word Embeddings with the UMLS Metathesaurus for Clinical Information Extraction

Abstract

Talk to us

Similar Papers