Emerging Language Spaces Learned From Massively Multilingual Corpora

Jörg Tiedemann

doi:10.5617/dhnbpub.11065

Abstract

Translations capture important information about languages that can be used as implicit supervision in learning linguistic properties and semantic representations. In an information-centric view, translated texts may be considered as semantic mirrors of the original text and the significant variations that we can observe across various languages can be used to disambiguate a given expression using the linguistic signal that is grounded in translation. Parallel corpora consisting of massive amounts of human translations with a large linguistic variation can be applied to increase abstractions and we propose the use of highly multilingual machine translation models to find language-independent meaning representations. Our initial experiments show that neural machine translation models can indeed learn in such a setup and we can show that the learning algorithm picks up information about the relation between languages in order to optimize transfer leaning with shared parameters. The model creates a continuous language space that represents relationships in terms of geometric distances, which we can visualize to illustrate how languages cluster according to language families and groups. Does this open the door for new ideas of data-driven language typology with promising models and techniques in empirical cross-linguistic research?

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Emerging Language Spaces Learned From Massively Multilingual Corpora

Abstract

Talk to us

Similar Papers

More From: Digital Humanities in the Nordic and Baltic Countries Publications

Lead the way for us

Journal: Digital Humanities in the Nordic and Baltic Countries Publications	Publication Date: Mar 29, 2018
License type: CC BY 4.0

Similar Papers

On the Linguistic Representational Power of Neural Machine Translation Models
Yonatan Belinkov ... Fahim Dalvi
Computational Linguistics | VOL. 46
Yonatan Belinkov, et. al.Yonatan Belinkov ... Fahim Dalvi
01 Mar 2020
Computational Linguistics | VOL. 46

What do Neural Machine Translation Models Learn about Morphology?
Yonatan Belinkov ... Nadir Durrani
-
Yonatan Belinkov, et. al.Yonatan Belinkov ... Nadir Durrani
01 Jan 2017
01 Jan 2017

A Document-Level Neural Machine Translation Model with Dynamic Caching Guided by Theme-Rheme Information
Yiqi Tong ... Xiaodong Shi
-
Yiqi Tong, et. al.Yiqi Tong ... Xiaodong Shi
01 Jan 2020
01 Jan 2020

Confidence Based Bidirectional Global Context Aware Training Framework for Neural Machine Translation
...
-
, et. al. ...
11 May 2022
11 May 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Emerging Language Spaces Learned From Massively Multilingual Corpora

Abstract

Talk to us

Similar Papers

More From: Digital Humanities in the Nordic and Baltic Countries Publications