Abstract

The number of sentence pairs in the bilingual corpus is a key to translation accuracy in computational machine translations. However, if the amount goes beyond a certain degree, the increasing number of cases has less impact on the translation while the construction of translation systems requires a considerable amount of time and energy, thus preventing the development of a statistical translation by the computer. This article offers a number of classifications for measuring the amount of information for each pair of sentences, using the Heuristic Bilingual Graph Corpus Network (HBGCN) to form an improved method of corpus selection that takes the difference between the first amount of information between the pairs of sentences into account. Using a graphic-based selector method as a training set, they achieve a close translation result through our experiments with the whole body and achieve better results than basic results for the following based on the Document Inverse Frequency (DIF) ranking approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call