Scheduled Multi-Task Learning: From Syntax to Translation

Eliyahu Kiperwasser,Miguel Ballesteros

doi:10.1162/tacl_a_00017

Abstract

Neural encoder-decoder models of machine translation have achieved impressive results, while learning linguistic knowledge of both the source and target languages in an implicit end-to-end manner. We propose a framework in which our model begins learning syntax and translation interleaved, gradually putting more focus on translation. Using this approach, we achieve considerable improvements in terms of BLEU score on relatively large parallel corpus (WMT14 English to German) and a low-resource (WIT German to English) setup.

Highlights

Neural Machine Translation (NMT) (Kalchbrenner and Blunsom, 2013; Sutskever et al, 2014; Bahdanau et al, 2014) has recently become the stateof-the-art approach to machine translation (Bojar et al, 2016)
As a side product of our research, we show that dependency parsing can be approached via a sequence to sequence with an attention mode commonly used for neural machine translation with linearized dependency trees
We examine the effect of Scheduled Multi-Task Learning on the translation quality compared to the baseline system with a constant value of the slope parameter (α) set to 0.5.7 We show that amount of representation bias the models chose to obtain by testing each model on each of the auxiliary tasks

Summary

Introduction

Neural Machine Translation (NMT) (Kalchbrenner and Blunsom, 2013; Sutskever et al, 2014; Bahdanau et al, 2014) has recently become the stateof-the-art approach to machine translation (Bojar et al, 2016). One of the main advantages of neural approaches is the impressive ability of RNNs to act as feature extractors over the entire input (Kiperwasser and Goldberg, 2016), rather than focusing on local information. Neural architectures are able to extract linguistic properties from the input sentence in the form of morphology (Belinkov et al, 2017) or syntax (Linzen et al, 2016). Providing explicit linguistic information (Dyer et al., 2016; Kuncoro et al, 2017; Niehues and Cho, 2017; Sennrich and Haddow, 2016; Eriguchi et al, 2017; Aharoni and Goldberg, 2017; Nadejde et al, 2017; Bastings et al, 2017; Matthews et al, 2018) has proven to be beneficial, achieving higher results in language modeling and machine translation. Tasks like multiword expression detection and part-of-speech tagging have been found very useful for others like combinatory categorical grammar (CCG) parsing, chunking and super-sense tagging (Bingel and Søgaard, 2017)

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Transactions of the Association for Computational Linguistics	Publication Date: Dec 1, 2018
Citations: 106	License type: cc-by

R Discovery Prime

R Discovery Prime

Scheduled Multi-Task Learning: From Syntax to Translation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics

Lead the way for us

Similar Papers

Hybrid Arabic–French machine translation using syntactic re-ordering and morphological pre-processing
Emad Mohamed ... Fatiha Sadat
Computer Speech & Language | VOL. 32
Emad Mohamed, et. al.Emad Mohamed ... Fatiha Sadat
08 Nov 2014
Computer Speech & Language | VOL. 32

Towards Efficient Translation Memory Search Based on Multiple Sentence Signatures
Juan M.
-
Juan M.Juan M.
21 Jun 2011
21 Jun 2011

Discourse-level Features for Statistical Machine Translation

-

01 Jan 2015
01 Jan 2015

Cross-Lingual Named Entity Recognition for Heterogenous Languages
Yingwen Fu ... Nankai Lin
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 31
Yingwen Fu, et. al.Yingwen Fu ... Nankai Lin
01 Jan 2023
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 31

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Scheduled Multi-Task Learning: From Syntax to Translation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics