Localizing in-domain adaptation of transformer-based biomedical language models

Tommaso Mario Buonocore,Claudio Crema,Alberto Redolfi,Riccardo Bellazzi,Enea Parimbelli

doi:10.1016/j.jbi.2023.104431

Abstract

In the era of digital healthcare, the huge volumes of textual information generated every day in hospitals constitute an essential but underused asset that could be exploited with task-specific, fine-tuned biomedical language representation models, improving patient care and management. For such specialized domains, previous research has shown that fine-tuning models stemming from broad-coverage checkpoints can largely benefit additional training rounds over large-scale in-domain resources. However, these resources are often unreachable for less-resourced languages like Italian, preventing local medical institutions to employ in-domain adaptation. In order to reduce this gap, our work investigates two accessible approaches to derive biomedical language models in languages other than English, taking Italian as a concrete use-case: one based on neural machine translation of English resources, favoring quantity over quality; the other based on a high-grade, narrow-scoped corpus natively written in Italian, thus preferring quality over quantity. Our study shows that data quantity is a harder constraint than data quality for biomedical adaptation, but the concatenation of high-quality data can improve model performance even when dealing with relatively size-limited corpora. The models published from our investigations have the potential to unlock important research opportunities for Italian hospitals and academia. Finally, the set of lessons learned from the study constitutes valuable insights towards a solution to build biomedical language models that are generalizable to other less-resourced languages and different domain settings.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Biomedical Informatics	Publication Date: Jun 28, 2023
Citations: 8	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Localizing in-domain adaptation of transformer-based biomedical language models

Abstract

Talk to us

Similar Papers

More From: Journal of Biomedical Informatics

Lead the way for us

Similar Papers

AMMU: A survey of transformer-based biomedical pretrained language models
Katikapalli Subramanyam Kalyan ... Sivanesan Sangeetha
Journal of Biomedical Informatics | VOL. 126
Katikapalli Subramanyam Kalyan, et. al.Katikapalli Subramanyam Kalyan ... Sivanesan Sangeetha
31 Dec 2021
Journal of Biomedical Informatics | VOL. 126

Application of Transformer-Based Language Models to Detect Hate Speech in Social Media
Swapnanil Mukherjee ... Sujit Das
Journal of Computational and Cognitive Engineering | VOL. 2
Swapnanil Mukherjee, et. al.Swapnanil Mukherjee ... Sujit Das
17 Dec 2021
Journal of Computational and Cognitive Engineering | VOL. 2

Predicting Generalized Anxiety Disorder From Impromptu Speech Transcripts Using Context-Aware Transformer-Based Neural Networks: Model Evaluation Study.
Bazen Gashaw Teferra ... Jonathan Rose
JMIR Mental Health | VOL. 10
Bazen Gashaw Teferra, et. al.Bazen Gashaw Teferra ... Jonathan Rose
28 Mar 2023
JMIR Mental Health | VOL. 10

Adapting transformer-based language models for heart disease detection and risk factors extraction
Essam H Houssein ... Abdelmgeid A Ali
Journal of Big Data | VOL. 11
Essam H Houssein, et. al.Essam H Houssein ... Abdelmgeid A Ali
04 Apr 2024
Journal of Big Data | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Localizing in-domain adaptation of transformer-based biomedical language models

Abstract

Talk to us

Similar Papers

More From: Journal of Biomedical Informatics