A cross-lingual sentence pair interaction feature capture model based on pseudo-corpus and multilingual embedding

Gang Liu,Zhizheng Yan,Kai Wang,Yichao Dong

doi:10.3233/aic-210085

Abstract

Recently, the emergence of the digital language division and the availability of cross-lingual benchmarks make researches of cross-lingual texts more popular. However, the performance of existing methods based on mapping relation are not good enough, because sometimes the structures of language spaces are not isomorphic. Besides, polysemy makes the extraction of interaction features hard. For cross-lingual word embedding, a model named Cross-lingual Word Embedding Space Based on Pseudo Corpus (CWE-PC) is proposed to obtain cross-lingual and multilingual word embedding. For cross-lingual sentence pair interaction feature capture, a Cross-language Feature Capture Based on Similarity Matrix (CFC-SM) model is built to extract cross-lingual interaction features. ELMo pretrained model and multiple layer convolution are used to alleviate polysemy and extract interaction features. These models are evaluated on multiple language pairs and results show that they outperform the state-of-the-art cross-lingual word embedding methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A cross-lingual sentence pair interaction feature capture model based on pseudo-corpus and multilingual embedding

Abstract

Talk to us

Similar Papers

More From: AI Communications

Lead the way for us

Journal: AI Communications	Publication Date: May 10, 2022
Citations: 2

Similar Papers

A word embedding-based approach to cross-lingual topic modeling
Chia-Hsuan Chang ... San-Yih Hwang
Knowledge and Information Systems | VOL. 63
Chia-Hsuan Chang, et. al.Chia-Hsuan Chang ... San-Yih Hwang
24 Apr 2021
Knowledge and Information Systems | VOL. 63

Do We Really Need Fully Unsupervised Cross-Lingual Embeddings?
Ivan Vulić ... Anna Korhonen
-
Ivan Vulić, et. al.Ivan Vulić ... Anna Korhonen
01 Jan 2019
01 Jan 2019

Unsupervised Cross-Lingual Sentence Representation Learning via Linguistic Isomorphism
Shuai Wang ... Lei Hou
-
Shuai Wang, et. al.Shuai Wang ... Lei Hou
01 Jan 2019
01 Jan 2019

Pre-tokenization of Multi-word Expressions in Cross-lingual Word Embeddings
Naoki Otani ... Lori Levin
-
Naoki Otani, et. al.Naoki Otani ... Lori Levin
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A cross-lingual sentence pair interaction feature capture model based on pseudo-corpus and multilingual embedding

Abstract

Talk to us

Similar Papers

More From: AI Communications