Do You Need Embeddings Trained on a Massive Specialized Corpus for Your Clinical Natural Language Processing Task?

Antoine Neuraz,Vincent Looten,Bastien Rance,Anita Burgun,Leonardo Campillos Llanos,Nicolas Garcelon,Nicolas Daniel,Sophie Rosset

doi:10.3233/shti190533

Do You Need Embeddings Trained on a Massive Specialized Corpus for Your Clinical Natural Language Processing Task?

Antoine Neuraz, Vincent Looten + Show 6 more

https://doi.org/10.3233/shti190533

Copy DOI

Journal: Studies in health technology and informatics	Publication Date: Aug 21, 2019
Citations: 4

Affiliation: Assistance Publique – Hôpitaux de Paris, University of Paris-Saclay, Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur, Institut Polytechnique de Paris, French National Centre for Scientific Research, Centre de Recherche des Cordeliers, Inserm, Sorbonne Paris Cité, Université Paris Cité, Délégation Paris 5

#Impact Of Data Source #Language Understanding + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

We explore the impact of data source on word representations for different NLP tasks in the clinical domain in French (natural language understanding and text classification). We compared word embeddings (Fasttext) and language models (ELMo), learned either on the general domain (Wikipedia) or on specialized data (electronic health records, EHR). The best results were obtained with ELMo representations learned on EHR data for one of the two tasks(+7% and +8% of gain in F1-score).

Full Text