Abstract

Neural encoder-decoder models of machine translation have achieved impressive results, while learning linguistic knowledge of both the source and target languages in an implicit end-to-end manner. We propose a framework in which our model begins learning syntax and translation interleaved, gradually putting more focus on translation. Using this approach, we achieve considerable improvements in terms of BLEU score on relatively large parallel corpus (WMT14 English to German) and a low-resource (WIT German to English) setup.

Highlights

  • Neural Machine Translation (NMT) (Kalchbrenner and Blunsom, 2013; Sutskever et al, 2014; Bahdanau et al, 2014) has recently become the stateof-the-art approach to machine translation (Bojar et al, 2016)

  • As a side product of our research, we show that dependency parsing can be approached via a sequence to sequence with an attention mode commonly used for neural machine translation with linearized dependency trees

  • We examine the effect of Scheduled Multi-Task Learning on the translation quality compared to the baseline system with a constant value of the slope parameter (α) set to 0.5.7 We show that amount of representation bias the models chose to obtain by testing each model on each of the auxiliary tasks

Read more

Summary

Introduction

Neural Machine Translation (NMT) (Kalchbrenner and Blunsom, 2013; Sutskever et al, 2014; Bahdanau et al, 2014) has recently become the stateof-the-art approach to machine translation (Bojar et al, 2016). One of the main advantages of neural approaches is the impressive ability of RNNs to act as feature extractors over the entire input (Kiperwasser and Goldberg, 2016), rather than focusing on local information. Neural architectures are able to extract linguistic properties from the input sentence in the form of morphology (Belinkov et al, 2017) or syntax (Linzen et al, 2016). Providing explicit linguistic information (Dyer et al., 2016; Kuncoro et al, 2017; Niehues and Cho, 2017; Sennrich and Haddow, 2016; Eriguchi et al, 2017; Aharoni and Goldberg, 2017; Nadejde et al, 2017; Bastings et al, 2017; Matthews et al, 2018) has proven to be beneficial, achieving higher results in language modeling and machine translation. Tasks like multiword expression detection and part-of-speech tagging have been found very useful for others like combinatory categorical grammar (CCG) parsing, chunking and super-sense tagging (Bingel and Søgaard, 2017)

Objectives
Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.