Abstract

The Coronavirus (COVID-19) pandemic has led to a rapidly growing 'infodemic' of health information online. This has motivated the need for accurate semantic search and retrieval of reliable COVID-19 information across millions of documents, in multiple languages. To address this challenge, this paper proposes a novel high precision and high recall neural Multistage BiCross encoder approach. It is a sequential three-stage ranking pipeline which uses the Okapi BM25 retrieval algorithm and transformer-based bi-encoder and cross-encoder to effectively rank the documents with respect to the given query. We present experimental results from our participation in the Multilingual Information Access (MLIA) shared task on COVID-19 multilingual semantic search. The independently evaluated MLIA results validate our approach and demonstrate that it outperforms other state-of-the-art approaches according to nearly all evaluation metrics in cases of both monolingual and bilingual runs.

Highlights

  • The COVID-19 pandemic has, to date, infected more than 135M people worldwide

  • We evaluate the performance of Multistage BiCross Encoder using the relevance assessments provided by the Multilingual Information Access (MLIA) organisers

  • We focus on subtask 2 where we aim at achieving both high recall as well as high precision values for the least number of retrieved documents per topic query

Read more

Summary

Introduction

The COVID-19 pandemic has, to date, infected more than 135M people worldwide. It has been accompanied by what the World Health Organisation has dubbed an ‘infodemic’, in reference to the challenge people face in navigating and absorbing the continuously growing volumes of information on the origin, treatment, prevention, and public policies related to COVID-19 that get published online by numerous sources (some authoritative and some not), in multiple languages and countries. We present a novel multistage BiCross encoder method and demonstrate that it outperforms other state-ofthe-art retrieval methods for COVID-19 multilingual semantic search, according to independent comparative evaluation on the MLIA shared task 2 dataset. Multistage BiCross encoder method, which is a three-stage ranking pipeline that uses the Okapi BM25 retrieval algorithm and state-of-the-art multilingual transformer-based biencoder and cross-encoder by aggregating sentence-level relevance scores for the task of COVID-19 multilingual semantic search. It describes the use of external training datasets which improved the model’s performance further and helped it achieve the best reported scores on the MLIA COVID-19 semantic search task, according to the independent comparative evaluation reports by the shared task organisers https://bitbucket.org/covid19-mlia/organizers-task2/src/master/ [10].

MLIA COVID-19 semantic search task
Dataset
Task description
Related work
Multistage BiCross encoder
BM25 retrieval stage
Neural refinement stage
Neural re-ranking stage
Implementation details
Results and discussion
Monolingual runs
Bilingual runs
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.