Abstract

Neural network methods exhibit strong performance only in a few resource-rich domains. Practitioners therefore employ domain adaptation from resource-rich domains that are, in most cases, distant from the target domain. Domain adaptation between distant domains (e.g., movie subtitles and research papers), however, cannot be performed effectively due to mismatches in vocabulary; it will encounter many domain-specific words (e.g., “angstrom”) and words whose meanings shift across domains (e.g., “conductor”). In this study, aiming to solve these vocabulary mismatches in domain adaptation for neural machine translation (NMT), we propose vocabulary adaptation, a simple method for effective fine-tuning that adapts embedding layers in a given pretrained NMT model to the target domain. Prior to fine-tuning, our method replaces the embedding layers of the NMT model by projecting general word embeddings induced from monolingual data in a target domain onto a source-domain embedding space. Experimental results indicate that our method improves the performance of conventional fine-tuning by 3.86 and 3.28 BLEU points in En-Ja and De-En translation, respectively.

Highlights

  • The performance of neural machine translation (NMT) models remarkably drops in domains different from the training data (Koehn and Knowles, 2017)

  • Vocabulary Adaptation (VA)-* methods did not work well in En→Ja translation when only the 100k target-domain parallel data was used. This is probably because the more noisy emebeddings introduced by the large number of domain-specific words in the Asian Scientific Paper Excerpt Corpus (ASPEC) dataset (Table 1) hinders the embedding projection of VA-locally linear mapping (LLM) and VA-linear transformation (Linear) with low-quality Continuous Bag-of-Words (CBOW) vectors trained from the 100k sentences

  • We tackled the vocabulary mismatch problem in domain adaptation for NMT, and we proposed vocabulary adaptation, a simple but direct solution to this problem

Read more

Summary

Introduction

The performance of neural machine translation (NMT) models remarkably drops in domains different from the training data (Koehn and Knowles, 2017). Target-domain NMT model Source-domain NMT model Encoder-Decoder. Target-domain monolingual data source-domain and small amount of target-domain parallel data, fine-tuning adjusts the parameters of a model pre-trained in the source-domain to the target domain. In fine-tuning, inheriting the embedding layers of the model pre-trained in the source domain causes vocabulary mismatches; namely, a model can handle neither domain-specific words that are not covered by a small amount of targetdomain parallel data (unknown words) nor words that have different meanings across domains (semantic shift).

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.