Abstract

Bio-BERT (BERT for Bio-medical Text Mining) is a Natural Language Processing (NLP) model, pre-trained on massive bio-medical data. Bio-BERT is effective in an extensive variety of NLP tasks that can be applied to bio-medical data. BERTSUM, BERTSUMABS, and BERTSUMEXTABS are NLP models built for the task of Extractive Text Summarization (ETS) and Abstractive Text Summarization (ATS). These models are evaluated on CNN/DailyMail and Extreme Summarization datasets. In this chapter, the objective is to achieve ETS and ATS for CORD-19 dataset. A hybrid NLP model based on Bio-BERT, BERTSUM, BERTSUMABS, and BERTSUMEXTABS has been proposed. As the objective is to find ETS and ATS on bio-medical datasets, Bio-BERT has been chosen as it is pre-trained on bio-medical PubMed full-text articles. BERTSUM, BERTSUMABS, and BERTSUMEXTABS models are chosen as they were fine-tuned for the task of text summarization. As there is a rapid acceleration in the novel COVID-19 publications, there is a need to obtain a summary of these publications in order to save time. The model generated summary has to be on par with the human written summary. Experiments were conducted on the CORD-19 dataset, and the proposed hybrid model has been evaluated based on ROUGE metric. The proposed model is compared with BERT-based BERTSUM, BERTSUMABS, and BERTSUMEXTABS on CORD-19 dataset and is found to achieve the highest ROUGE values for the task of ETS and ATS.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call