Abstract
The task of Cross-lingual Passage Re-ranking (XPR) aims to rank a list of candidate passages in multiple languages given a query, which is generally challenged by two main issues: (1) the query and passages to be ranked are often in different languages, which requires strong cross-lingual alignment, and (2) the lack of annotated data for model training and evaluation. In this article, we propose a two-stage approach to address these issues. At the first stage, we introduce the task of Cross-lingual Paraphrase Identification (XPI) as an extra pre-training to augment the alignment by leveraging a large unsupervised parallel corpus. This task aims to identify whether two sentences, which may be from different languages, have the same meaning. At the second stage, we introduce and compare three effective strategies for cross-lingual training. To verify the effectiveness of our method, we construct an XPR dataset by assembling and modifying two monolingual datasets. Experimental results show that our augmented pre-training contributes significantly to the XPR task. Besides, we directly transfer the trained model to test on out-domain data which are constructed by modifying three multi-lingual Question Answering (QA) datasets. The results demonstrate the cross-domain robustness of the proposed approach.
Highlights
Passage re-ranking is an essential task in many Natural Language Processing (NLP) applications such as passage retrieval for open-domain question answering
In this article we explore Cross-lingual Passage Re-ranking (XPR), which refers to ranking a list of candidate passages in multiple languages, of which only a portion are in the same language as the query
Instead of training a transformer architecture from the scratch, here we adopt multilingual BERT (mBERT)’s pre-trained weights as the initialization for basic language understanding, and we extend the pre-training with the XPI task
Summary
Passage re-ranking is an essential task in many Natural Language Processing (NLP) applications such as passage retrieval for open-domain question answering. It requires a system to rank a list of candidate passages based on the provided query. In the existing passage re-ranking literature, it is commonly assumed that the query and the passages to be ranked are both in the same language, e.g., English or Chinese. A passage re-ranking module is required for the robot which can perform passage re-ranking in a cross-lingual scenario. To address this issue, in this article we explore Cross-lingual Passage Re-ranking (XPR), which refers to ranking a list of candidate passages in multiple languages, of which only a portion are in the same language as the query
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.