Abstract
Reordering is one of the most important factors affecting the quality of the output in statistical machine translation (SMT). A considerable number of approaches that proposed addressing the reordering problem are discriminative reordering models (DRM). The core component of the DRMs is a classifier which tries to predict the correct word order of the sentence. Unfortunately, the relationship between classification quality and ultimate SMT performance has not been investigated to date. Understanding this relationship will allow researchers to select the classifier that results in the best possible MT quality. It might be assumed that there is a monotonic relationship between classification quality and SMT performance, i.e., any improvement in classification performance will be monotonically reflected in overall SMT quality. In this paper, we experimentally show that this assumption does not always hold, i.e., an improvement in classification performance might actually degrade the quality of an SMT system, from the point of view of MT automatic evaluation metrics. However, we show that if the improvement in the classification performance is high enough, we can expect the SMT quality to improve as well. In addition to this, we show that there is a negative relationship between classification accuracy and SMT performance in imbalanced parallel corpora. For these types of corpora, we provide evidence that, for the evaluation of the classifier, macro-averaged metrics such as macro-averaged F-measure are better suited than accuracy, the metric commonly used to date.
Highlights
Statistical Machine Translation (SMT) systems automatically translate from one natural language into another
We study the relationship between the performance of the reordering classifier and SMT quality in three parallel corpora from different language pairs, and experimentally show that this assumption does not always hold
We investigate the relationship between classification performance and SMT quality, and provide some guidelines for intrinsic evaluation of the classification performance in SMT
Summary
Statistical Machine Translation (SMT) systems automatically translate from one natural language into another. To the best of our knowledge the relationship between classification quality and SMT performance has not been studied to date. It might be assumed that improvements in classifier quality will be monotonically reflected in overall SMT performance. This is the assumption that justifies previous work which tries to find the best classifier for an SMT system, based solely on the classifier quality metrics [2,3,4]. We study the relationship between the performance of the reordering classifier and SMT quality in three parallel corpora from different language pairs, and experimentally show that this assumption does not always hold
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.