Abstract
Neural machine translation (NMT) has recently gained substantial popularity not only in academia, but also in industry. For its acceptance in industry it is important to investigate how NMT performs in comparison to the phrase-based statistical MT (PBSMT) model, that until recently was the dominant MT paradigm. In the present work, we compare the quality of the PBSMT and NMT solutions of KantanMT—a commercial platform for custom MT—that are tailored to accommodate large-scale translation production, where there is a limited amount of time to train an end-to-end system (NMT or PBSMT). In order to satisfy the time requirements of our production line, we restrict the NMT training time to 4 days; to train a PBSMT system typically requires no longer than one day with the current training pipeline of KantanMT. To train both NMT and PBSMT engines for each language pair, we strictly use the same parallel corpora and the same pre- and post-processing steps (when applicable). Our results show that, even with time-restricted training of 4 days, NMT quality substantially surpasses that of PBSMT. Furthermore, we challenge the reliability of automatic quality evaluation metrics based on n-gram comparison (in particular F-measure, BLEU and TER) for NMT quality evaluation. We support our hypothesis with both analytical and empirical evidence. We investigate how suitable these metrics are when comparing the two different paradigms.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.