Abstract

With the rapid development of machine translation (MT), the MT evaluation becomes very important to timely tell us whether the MT system makes any progress. The conventional MT evaluation methods tend to calculate the similarity between hypothesis translations offered by automatic translation systems and reference translations offered by professional translators. There are several weaknesses in existing evaluation metrics. Firstly, the designed incomprehensive factors result in language-bias problem, which means they perform well on some special language pairs but weak on other language pairs. Secondly, they tend to use no linguistic features or too many linguistic features, of which no usage of linguistic feature draws a lot of criticism from the linguists and too many linguistic features make the model weak in repeatability. Thirdly, the employed reference translations are very expensive and sometimes not available in the practice. In this paper, the authors propose an unsupervised MT evaluation metric using universal part-of-speech tagset without relying on reference translations. The authors also explore the performances of the designed metric on traditional supervised evaluation tasks. Both the supervised and unsupervised experiments show that the designed methods yield higher correlation scores with human judgments.

Highlights

  • The research about machine translation (MT) can be traced back to fifty years ago [1] and people benefit much from it about the information exchange with the rapid development of the computer technology

  • The human evaluation is expensive and time consuming. This leads to the appearance of the automatic evaluation metrics, which give quick and cheap evaluation for MT systems

  • Most of the automatic MT evaluation metrics are reference aware, which means they tend to employ different approaches to calculate the closeness between the hypothesis translations offered by MT systems and the reference translations provided by professional translators

Read more

Summary

Introduction

The research about machine translation (MT) can be traced back to fifty years ago [1] and people benefit much from it about the information exchange with the rapid development of the computer technology. People use the human evaluation approaches for the quality estimation of MT systems, such as the adequacy and fluency criteria. The human evaluation is expensive and time consuming. This leads to the appearance of the automatic evaluation metrics, which give quick and cheap evaluation for MT systems. The automatic evaluation metrics can be used to tune the MT systems for better output quality. Most of the automatic MT evaluation metrics are reference aware, which means they tend to employ different approaches to calculate the closeness between the hypothesis translations offered by MT systems and the reference translations provided by professional translators. This paper will propose an automatic evaluation approach for English-to-German translation by calculating the similarity between source and hypothesis translations without using of reference translation. The potential usage of the proposed evaluation algorithms in the traditional referenceaware MT evaluation tasks will be explored

Methods
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call