Abstract

Recently, the emergence of the digital language division and the availability of cross-lingual benchmarks make researches of cross-lingual texts more popular. However, the performance of existing methods based on mapping relation are not good enough, because sometimes the structures of language spaces are not isomorphic. Besides, polysemy makes the extraction of interaction features hard. For cross-lingual word embedding, a model named Cross-lingual Word Embedding Space Based on Pseudo Corpus (CWE-PC) is proposed to obtain cross-lingual and multilingual word embedding. For cross-lingual sentence pair interaction feature capture, a Cross-language Feature Capture Based on Similarity Matrix (CFC-SM) model is built to extract cross-lingual interaction features. ELMo pretrained model and multiple layer convolution are used to alleviate polysemy and extract interaction features. These models are evaluated on multiple language pairs and results show that they outperform the state-of-the-art cross-lingual word embedding methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call