Bilingual phrase induction with local hard negative sampling

Hailong Cao,Hualin Miao,Tiejun Zhao,Weixuan Wang,Wei Peng,Liangyou Li

doi:10.1049/cit2.12383

Abstract

AbstractBilingual lexicon induction focuses on learning word translation pairs, also known as bitexts, from monolingual corpora by establishing a mapping between the source and target embedding spaces. Despite recent advancements, bilingual lexicon induction is limited to inducing bitexts consisting of individual words, lacking the ability to handle semantics‐rich phrases. To bridge this gap and support downstream cross‐lingual tasks, it is practical to develop a method for bilingual phrase induction that extracts bilingual phrase pairs from monolingual corpora without relying on cross‐lingual knowledge. In this paper, the authors propose a novel phrase embedding training method based on the skip‐gram structure. Specifically, a local hard negative sampling strategy that utilises negative samples of central tokens in sliding windows to enhance phrase embedding learning is introduced. The proposed method achieves competitive or superior performance compared to baseline approaches, with exceptional results recorded for distant languages. Additionally, we develop a phrase representation learning method that leverages multilingual pre‐trained language models. These mPLMs‐based representations can be combined with the above‐mentioned static phrase embeddings to further improve the accuracy of the bilingual phrase induction task. We manually construct a dataset of bilingual phrase pairs and integrate it with MUSE to facilitate the bilingual phrase induction task.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Bilingual phrase induction with local hard negative sampling

Abstract

Talk to us

Similar Papers

More From: CAAI Transactions on Intelligence Technology

Lead the way for us

Journal: CAAI Transactions on Intelligence Technology	Publication Date: Oct 1, 2024
License type: CC BY 4.0

Similar Papers

A Comprehensive Analysis of Bilingual Lexicon Induction
Ann Irvine ... Chris Callison-Burch
Computational Linguistics | VOL. 43
Ann Irvine, et. al.Ann Irvine ... Chris Callison-Burch
01 Jun 2017
Computational Linguistics | VOL. 43

Bilingual Lexicon Induction through Unsupervised Machine Translation
Mikel Artetxe ... Gorka Labaka
-
Mikel Artetxe, et. al.Mikel Artetxe ... Gorka Labaka
01 Jan 2019
01 Jan 2019

Two Methods for Domain Adaptation of Bilingual Tasks: Delightfully Simple and Broadly Applicable
Viktor Hangya ... Hinrich Schütze
-
Viktor Hangya, et. al.Viktor Hangya ... Hinrich Schütze
01 Jan 2018
01 Jan 2018

A Semi-supervised Bilingual Lexicon Induction Method for Distant Language Pairs Based on Bidirectional Adversarial Model
Wenwu Zhi ... Yuhong Zhang
-
Wenwu Zhi, et. al.Wenwu Zhi ... Yuhong Zhang
01 Dec 2021
01 Dec 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bilingual phrase induction with local hard negative sampling

Abstract

Talk to us

Similar Papers

More From: CAAI Transactions on Intelligence Technology