Abstract
PurposeThe purpose of this paper is to build a better question answering (QA) system that can furnish more improved retrieval of answers related to COVID-19 queries from the COVID-19 open research data set (CORD-19). As CORD-19 has an up-to-date collection of coronavirus literature, text mining approaches can be successfully used to retrieve answers pertaining to all coronavirus-related questions. The existing a lite BERT for self-supervised learning of language representations (ALBERT) model is finetuned for retrieving all COVID relevant information to scientific questions posed by the medical community and to highlight the context related to the COVID-19 query.Design/methodology/approachThis study presents a finetuned ALBERT-based QA system in association with Best Match25 (Okapi BM25) ranking function and its variant BM25L for context retrieval and provided high scores in benchmark data sets such as SQuAD for answers related to COVID-19 questions. In this context, this paper has built a QA system, pre-trained on SQuAD and finetuned it on CORD-19 data to retrieve answers related to COVID-19 questions by extracting semantically relevant information related to the question.FindingsBM25L is found to be more effective in retrieval compared to Okapi BM25. Hence, finetuned ALBERT when extended to the CORD-19 data set provided accurate results.Originality/valueThe finetuned ALBERT QA system was developed and tested for the first time on the CORD-19 data set to extract context and highlight the span of the answer for more clarity to the user.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have