Abstract

Neural machine translation (NMT) has achieved state-of-the-art performance in many translation tasks. However, because the computational cost increases with the size of the search space for predicting the target words, the translation quality of NMT is constrained by the limited vocabulary. To alleviate this problem, we propose a novel dynamic hierarchical decoder for NMT to utilize all of the target words in the training and decoding process. In the proposed model, a target word is represented by two latent attribute vectors rather than a word vector. The model is trained to dynamically put together those words that share similar linguistic attributes. The prediction of a target word is, therefore, turned into the prediction of attribute vectors, where the $\mathrm{softmax}$ functions are performed at the attribute level. This greatly reduces the model size and the decoding time. Our experimental results demonstrate that the proposed model significantly outperforms the NMT baselines in both Chinese-English and English-German translation tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call