Abstract

Well-developed medical terminology systems like the Unified Medical Language System (UMLS) improve the ability of language models to handle medical entity linking tasks. However, such magnificent terminology systems are only available for few languages, such as English. For Chinese, both simplified and traditional, the lack of well-developed terminology systems remains a big challenge to unify Chinese medical terminologies by linking medical entities as concepts. In this study, we purpose a translation enhanced contrastive learning scheme which leverages translations and synonyms of UMLS to infuse knowledge into the language model, and present a cross-lingual pre-trained language model called TeaBERT that aligns cross-lingual Chinese and English medical synonyms well at semantic level. Comparing with former cross-lingual language models, TeaBERT significantly outperforms on evaluation datasets, with 93.21%, 89.89% and 76.45% of Top 5 accuracy on ICDI0-CN, CHPO and RealWorld dataset respectively, and achieves new state-of-theart performance without task specific fine-tuning. Our contrastive learning scheme can not only be used for enhancing Chinese-English medical concepts alignment, but also be applied to other languages facing the same challenges.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call