On Learning Better Embeddings from Chinese Clinical Records: Study on Combining In-Domain and Out-Domain Data

Yaqiang Wang,Yongguang Jiang,Hongping Shu,Yunhui Chen

doi:10.18653/v1/w18-2323

Abstract

High quality word embeddings are of great significance to advance applications of biomedical natural language processing. In recent years, a surge of interest on how to learn good embeddings and evaluate embedding quality based on English medical text has become increasing evident, however a limited number of studies based on Chinese medical text, particularly Chinese clinical records, were performed. Herein, we proposed a novel approach of improving the quality of learned embeddings using out-domain data as a supplementary in the case of limited Chinese clinical records. Moreover, the embedding quality evaluation method was conducted based on Medical Conceptual Similarity Property. The experimental results revealed that selecting good training samples was necessary, and collecting right amount of out-domain data and trading off between the quality of embeddings and the training time consumption were essential factors for better embeddings.

Highlights

Word embeddings, or embeddings for short, have been widely used in various natural language processing tasks, such as language modeling (Bengio et al, 2003; Sundermeyer, et al 2012; Adams et al, 2017), syntactic parsing (Grefenstette et al, 2014; Tu et al, 2017) and part-ofspeech tagging (Yang and Eisenstein, 2016)
Learning embeddings from English medical texts, as a hot topic in recent years, has been extensively studied due to the efforts of open datasets, such as UMLS of NLM (Bodenreider, 2004), medical journal abstracts from PubMed (Choi et al, 2016a), and some released clinical data (Finlayson, et al, 2014; Stubbs and Uzuner, 2015)
Referring to the evaluation method for medical concept embeddings proposed in (Choi et al, 2016b) which is based on medical conceptual similarity property, we proposed a method for distantly evaluating the learned embeddings from Chinese clinical records using an additional standard medical terminology dataset

Summary

Introduction

Embeddings for short, have been widely used in various natural language processing tasks, such as language modeling (Bengio et al, 2003; Sundermeyer, et al 2012; Adams et al, 2017), syntactic parsing (Grefenstette et al, 2014; Tu et al, 2017) and part-ofspeech tagging (Yang and Eisenstein, 2016). Learning embeddings from English medical texts, as a hot topic in recent years, has been extensively studied due to the efforts of open datasets, such as UMLS of NLM (Bodenreider, 2004), medical journal abstracts from PubMed (Choi et al, 2016a), and some released clinical data (Finlayson, et al, 2014; Stubbs and Uzuner, 2015). These datasets have been widely used as gold standards by the biomedical natural language processing domain for learning embeddings (De Vine et al, 2014; Choi et al, 2016b). The learned embeddings from Chinese clinical records are not good enough

Methods

Findings

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

On Learning Better Embeddings from Chinese Clinical Records: Study on Combining In-Domain and Out-Domain Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2018
Citations: 13	License type: cc-by

Similar Papers

ON THE ISSUE OF TRANSLATING ENGLISH AND CHINESE MEDICAL DISCOURSE
Ilona Derik
International Science Journal of Education & Linguistics | VOL. 1
Ilona DerikIlona Derik
01 Aug 2022
International Science Journal of Education & Linguistics | VOL. 1

On Character vs Word Embeddings as Input for English Sentence Classification
James Hammerton ... Michele Sama
-
James Hammerton, et. al.James Hammerton ... Michele Sama
09 Nov 2018
09 Nov 2018

Detecting negation and scope in Chinese clinical notes using character and word embedding
Tian Kang ... Jianbo Lei
Computer Methods and Programs in Biomedicine | VOL. 140
Tian Kang, et. al.Tian Kang ... Jianbo Lei
23 Nov 2016
Computer Methods and Programs in Biomedicine | VOL. 140

Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records
An Fang ... Ming Feng
BMC Medical Informatics and Decision Making | VOL. 22
An Fang, et. al.An Fang ... Ming Feng
23 Mar 2022
BMC Medical Informatics and Decision Making | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On Learning Better Embeddings from Chinese Clinical Records: Study on Combining In-Domain and Out-Domain Data

Abstract

Highlights

Summary

Talk to us

Similar Papers