Abstract

In this paper, we propose a novel recursive recurrent neural network (R 2 NN) to model the end-to-end decoding process for statistical machine translation. R 2 NN is a combination of recursive neural network and recurrent neural network, and in turn integrates their respective capabilities: (1) new information can be used to generate the next hidden state, like recurrent neural networks, so that language model and translation model can be integrated naturally; (2) a tree structure can be built, as recursive neural networks, so as to generate the translation candidates in a bottom up manner. A semi-supervised training approach is proposed to train the parameters, and the phrase pair embedding is explored to model translation confidence directly. Experiments on a Chinese to English translation task show that our proposed R 2 NN can outperform the stateof-the-art baseline by about 1.5 points in BLEU.

Highlights

  • Deep Neural Network (DNN), which essentially is a multi-layer neural network, has re-gained more and more attentions these years

  • Word embedding xt is integrated as new input information in recurrent neural networks for each prediction, but in recursive neural networks, no additional input information is used except the two representation vectors of the child nodes

  • We propose a Recursive Recurrent Neural Network(R2NN) to combine the recurrent neural network and recursive neural network

Read more

Summary

Introduction

Deep Neural Network (DNN), which essentially is a multi-layer neural network, has re-gained more and more attentions these years. DNN is introduced to Statistical Machine Translation (SMT) to learn several components or features of conventional framework, including word alignment, language modelling, translation modelling and distortion modelling. Auli et al (2013) propose a joint language and translation model, based on a recurrent neural network. Their model predicts a target word, with an unbounded history of both source and target words. Different from the work mentioned above, which applies DNN to components of conventional SMT framework, in this paper, we propose a novel R2NN to model the end-to-end decoding process. All the representations of nodes are generated based on their child nodes, and it is difficult to integrate additional global information, such as language model and distortion model.

Related Work
Our Model
Recurrent Neural Network
Recursive Neural Network
Recursive Recurrent Neural Network
Model Training
Unsupervised Pre-training
Supervised Local Training
Supervised Global Training
Phrase Pair Embedding
Translation Confidence with Sparse Features
Experiments and Results
Data Setting and Baseline
Translation Results
Effects of Global Recurrent Input Vector
Sparse Features and Recurrent Network Features
Conclusion and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call