Cross-lingual event retrieval is an information retrieval task aimed at cross-lingual event retrieval among multiple languages to find text or documents related to a specific event. Specific to Chinese-Vietnamese cross-language event retrieval, it involves using Chinese as a query to retrieve Vietnamese documents related to the query event. The critical issue is how to efficiently align query and document representations with limited resources. Existing cross-language pre-training models are trained on large-scale multilingual corpora, but their training goals do not include explicit language alignment tasks. Due to the uneven distribution of training corpora between different languages, these models have The problem of language bias. Therefore, this linguistic bias is also inherited in cross-lingual retrieval based on these models. To solve this problem, this paper proposes a Chinese-Vietnamese cross-lingual event retrieval method based on knowledge distillation. This approach enables the model to learn good query-document matching features from monolingual retrieval by transferring knowledge from high-resource to low-resource languages. By enhancing the alignment between queries and documents in different languages in a shared semantic space, the method improves the performance of Chinese-Vietnamese cross-lingual event retrieval.
Read full abstract