Building and using multimodal comparable corpora for machine translation

Haithem Afli,Loïc Barrault,Holger Schwenk

doi:10.1017/s1351324916000152

Abstract

AbstractIn recent decades, statistical approaches have significantly advanced the development of machine translation systems. However, the applicability of these methods directly depends on the availability of very large quantities of parallel data. Recent works have demonstrated that a comparable corpus can compensate for the shortage of parallel corpora. In this paper, we propose an alternative to comparable corpora containing text documents as resources for extracting parallel data: a multimodal comparable corpus with audio documents in source language and text document in target language, built fromEuronewsandTEDweb sites. The audio is transcribed by an automatic speech recognition system, and translated with a baseline statistical machine translation system. We then use information retrieval in a large text corpus in the target language in order to extract parallel sentences/phrases. We evaluate the quality of the extracted data on an English to French translation task and show significant improvements over a state-of-the-art baseline.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Building and using multimodal comparable corpora for machine translation

Abstract

Talk to us

Similar Papers

More From: Natural Language Engineering

Lead the way for us

Journal: Natural Language Engineering	Publication Date: Jun 15, 2016
Citations: 5

Similar Papers

Sentence Alignment by Means of Cross-Language Information Retrieval
Marta R. ... Rafael E.
-
Marta R., et. al.Marta R. ... Rafael E.
21 Jun 2011
21 Jun 2011

Baidu Translate: Research and Products
Zhongjun He
-
Zhongjun HeZhongjun He
01 Jan 2015
01 Jan 2015

Natural Language Processing and Computational Linguistics
Junichi Tsujii
Computational Linguistics | VOL. -
Junichi TsujiiJunichi Tsujii
07 Dec 2021
Computational Linguistics | VOL. -

The Functions Of Taboo Words And Their Translation In Subtitling: A Case Study In “The Help”
Agus Darma Yoga Pratama
RETORIKA: Jurnal Ilmu Bahasa | VOL. 2
Agus Darma Yoga PratamaAgus Darma Yoga Pratama
22 Feb 2017
RETORIKA: Jurnal Ilmu Bahasa | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Building and using multimodal comparable corpora for machine translation

Abstract

Talk to us

Similar Papers

More From: Natural Language Engineering