Abstract

Automatic evaluation is known as a consistent and relatively inexpensive method of evaluating machine translation. Thus, it is useful in developing machine translation (MT) systems. Automatic MT evaluation methods basically require reference (human) translation. There are two types of automatic machine translation evaluation methods: one assesses machine translation by comparing it with reference translation, e.g., BLEU, NIST, METEOR, and the other method evaluates machine translation by classifying it as good (human translation-like) or bad (machine translation-like) translation based on properties of the translation. Due to the cost of the preparation of reference translation, the former method is more expensive. By contrast, the latter is inexpensive, because it requires reference translation only for training a classifier using machine learning algorithms. Previous studies of classification-based evaluation assessed the validity of their evaluation methods based on classification accuracy for overall machine translation. Since fluency (naturalness) of machine translation varies, classification results should reflect the fluency of translation. More fluent machine translation should involve less machine translation-like translation than poor machine translation. Therefore, this study assesses classification-based evaluation based on the percentage of MT-like translations in more fluent translation. This study further investigates the validity of classification-based evaluation by comparing classification-based evaluation with reference translation-based evaluation and manual evaluation results by human evaluators. The experimental results show that our classification-based method can accurately evaluate fluency of machine translation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call