Multistage BiCross encoder for multilingual access to COVID-19 health information.

Iknoor Singh,Kalina Bontcheva,Carolina Scarton

doi:10.1371/journal.pone.0256874

Iknoor Singh, Kalina Bontcheva + Show 1 more

Open Access

https://doi.org/10.1371/journal.pone.0256874

Copy DOI

Journal: PLOS ONE	Publication Date: Sep 7, 2021
License type: CC BY 4.0

Affiliation: University of Sheffield

Abstract

The Coronavirus (COVID-19) pandemic has led to a rapidly growing 'infodemic' of health information online. This has motivated the need for accurate semantic search and retrieval of reliable COVID-19 information across millions of documents, in multiple languages. To address this challenge, this paper proposes a novel high precision and high recall neural Multistage BiCross encoder approach. It is a sequential three-stage ranking pipeline which uses the Okapi BM25 retrieval algorithm and transformer-based bi-encoder and cross-encoder to effectively rank the documents with respect to the given query. We present experimental results from our participation in the Multilingual Information Access (MLIA) shared task on COVID-19 multilingual semantic search. The independently evaluated MLIA results validate our approach and demonstrate that it outperforms other state-of-the-art approaches according to nearly all evaluation metrics in cases of both monolingual and bilingual runs.

Highlights

The COVID-19 pandemic has, to date, infected more than 135M people worldwide
We evaluate the performance of Multistage BiCross Encoder using the relevance assessments provided by the Multilingual Information Access (MLIA) organisers
We focus on subtask 2 where we aim at achieving both high recall as well as high precision values for the least number of retrieved documents per topic query

Summary

Introduction

The COVID-19 pandemic has, to date, infected more than 135M people worldwide. It has been accompanied by what the World Health Organisation has dubbed an ‘infodemic’, in reference to the challenge people face in navigating and absorbing the continuously growing volumes of information on the origin, treatment, prevention, and public policies related to COVID-19 that get published online by numerous sources (some authoritative and some not), in multiple languages and countries. We present a novel multistage BiCross encoder method and demonstrate that it outperforms other state-ofthe-art retrieval methods for COVID-19 multilingual semantic search, according to independent comparative evaluation on the MLIA shared task 2 dataset. Multistage BiCross encoder method, which is a three-stage ranking pipeline that uses the Okapi BM25 retrieval algorithm and state-of-the-art multilingual transformer-based biencoder and cross-encoder by aggregating sentence-level relevance scores for the task of COVID-19 multilingual semantic search. It describes the use of external training datasets which improved the model’s performance further and helped it achieve the best reported scores on the MLIA COVID-19 semantic search task, according to the independent comparative evaluation reports by the shared task organisers https://bitbucket.org/covid19-mlia/organizers-task2/src/master/ [10].

MLIA COVID-19 semantic search task

Dataset

Task description

Related work

Multistage BiCross encoder

BM25 retrieval stage

Neural refinement stage

Neural re-ranking stage

Implementation details

Results and discussion

Monolingual runs

Bilingual runs

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multistage BiCross encoder for multilingual access to COVID-19 health information.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

Multistage BiCross encoder for multilingual access to COVID-19 health information
Thippa Reddy Gadekallu ... Iknoor Singh
-
Thippa Reddy Gadekallu, et. al.Thippa Reddy Gadekallu ... Iknoor Singh
07 Sep 2021
07 Sep 2021

Important technical issues for digital libraries with multiple collections, different languages, and diverse audiences (SIG III, DL)
Gregory M Shreve ... Daqing He
Proceedings of the American Society for Information Science and Technology | VOL. 41
Gregory M Shreve, et. al.Gregory M Shreve ... Daqing He
01 Jan 2004
Proceedings of the American Society for Information Science and Technology | VOL. 41

Challenges for Globalised Information Systems in a Multilingual and Multicultural Context
Matthias Görtz ... Katrin Werner
-
Matthias Görtz, et. al.Matthias Görtz ... Katrin Werner
04 Dec 2012
04 Dec 2012

Query reformulation approach using domain specific ontology for semantic information retrieval
Navjot Kaur ... Himanshu Aggarwal
International Journal of Information Technology | VOL. 13
Navjot Kaur, et. al.Navjot Kaur ... Himanshu Aggarwal
09 May 2020
International Journal of Information Technology | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multistage BiCross encoder for multilingual access to COVID-19 health information.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE