BioVAE: a pre-trained latent variable language model for biomedical text mining.

Hai-Long Trieu,Makoto Miwa,Sophia Ananiadou

doi:10.1093/bioinformatics/btab702

Abstract

SummaryLarge-scale pre-trained language models (PLMs) have advanced state-of-the-art (SOTA) performance on various biomedical text mining tasks. The power of such PLMs can be combined with the advantages of deep generative models. These are examples of these combinations. However, they are trained only on general domain text, and biomedical models are still missing. In this work, we describe BioVAE, the first large-scale pre-trained latent variable language model for the biomedical domain, which uses the OPTIMUS framework to train on large volumes of biomedical text. The model shows SOTA performance on several biomedical text mining tasks when compared to existing publicly available biomedical PLMs. In addition, our model can generate more accurate biomedical sentences than the original OPTIMUS output.Availability and implementationOur source code and pre-trained models are freely available: https://github.com/aistairc/BioVAE.Supplementary information Supplementary data are available at Bioinformatics online.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Bioinformatics	Publication Date: Oct 12, 2021
Citations: 5	License type: CC BY-NC 4.0

R Discovery Prime

R Discovery Prime

BioVAE: a pre-trained latent variable language model for biomedical text mining.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Similar Papers

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.
Jinhyuk Lee ... Donghyeon Kim
Bioinformatics | VOL. 36
Jinhyuk Lee, et. al.Jinhyuk Lee ... Donghyeon Kim
10 Sep 2019
Bioinformatics | VOL. 36

Pre-trained language models with domain knowledge for biomedical extractive summarization
Qianqian Xie ... Sophia Ananiadou
Knowledge-Based Systems | VOL. 252
Qianqian Xie, et. al.Qianqian Xie ... Sophia Ananiadou
19 Jul 2022
Knowledge-Based Systems | VOL. 252

BioHanBERT: A Hanzi-aware Pre-trained Language Model for Chinese Biomedical Text Mining
Xiaosu Wang ... Jingwen Yue
-
Xiaosu Wang, et. al.Xiaosu Wang ... Jingwen Yue
01 Dec 2021
01 Dec 2021

StaResGRU-CNN with CMedLMs: A stacked residual GRU-CNN with pre-trained biomedical language models for predictive intelligence
Pin Ni ... Victor Chang
Applied Soft Computing | VOL. 113
Pin Ni, et. al.Pin Ni ... Victor Chang
13 Oct 2021
Applied Soft Computing | VOL. 113

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

BioVAE: a pre-trained latent variable language model for biomedical text mining.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics