Topic-Based Unsupervised and Supervised Dictionary Induction

Yuzhi Liu,Massimo Piccardi

doi:10.1145/3564698

Abstract

Word translation is a natural language processing task that provides translation between the words of a source and a target language. As a task, it reduces to the induction of a bilingual dictionary, which is typically performed by aligning word embeddings of the source language to word embeddings of the target language. To date, all the existing approaches have focused on performing a single, global alignment in word embedding space. However, semantic differences between the various languages, in addition to differences in the content of the corpora used for training the word embeddings, can hinder the effectiveness of such a global alignment. For this reason, in this article we propose conducting the alignment between the source and target embedding spaces by multiple mappings at topic level. The experimental results show that our approach has been able to achieve an average accuracy improvement of +3.30 percentage points over a state-of-the-art approach in unsupervised dictionary induction from languages as diverse as German, French, Italian, Spanish, Finnish, Turkish, and Chinese to English, and +3.95 points average improvement in supervised dictionary induction.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Topic-Based Unsupervised and Supervised Dictionary Induction

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing

Lead the way for us

Journal: ACM Transactions on Asian and Low-Resource Language Information Processing	Publication Date: Mar 10, 2023
Citations: 1

Similar Papers

Word Embeddings for Natural Language Processing

-

01 Jan 2015
01 Jan 2015

Visual exploration and comparison of word embeddings
Juntian Chen ... Hai Lin
Journal of Visual Languages & Computing | VOL. 48
Juntian Chen, et. al.Juntian Chen ... Hai Lin
24 Aug 2018
Journal of Visual Languages & Computing | VOL. 48

Word Embedding for Bengali Language using Domain-related Corpus
Ashutosh Bandyopadhyay ... Jayashree Nair
-
Ashutosh Bandyopadhyay, et. al.Ashutosh Bandyopadhyay ... Jayashree Nair
26 Apr 2023
26 Apr 2023

Multi-Task Text Classification using Graph Convolutional Networks for Large-Scale Low Resource Language
Mounika Marreddy ... Radhika Mamidi
-
Mounika Marreddy, et. al.Mounika Marreddy ... Radhika Mamidi
18 Jul 2022
18 Jul 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Topic-Based Unsupervised and Supervised Dictionary Induction

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing