Abstract

In recent years, with increasing international communication and cooperation, the consensus of toponymic information among different countries has become increasingly important. A large number of English geographical names are in urgent need of translation into Chinese, but there are few studies on machine translation of geographical names at present. Therefore, this paper proposes a method of automatically translating English geographical names into Chinese. First, the lexical structure of the geographic names is analyzed to divide the whole name into two parts, the special name and the general name, in an approach based on the statistical template model that implements pointwise mutual information and a directed acyclic graph data structure on the extracted names from different categories of a geographical name corpus. Second, the two parts of the geographic names are translated. The general name can be directly translated via methods of free translation. For the transliteration of the special name, the phonetic symbols are generated based on the cyclic neural network, and then, the syllables are divided based on the minimum entropy and converted into Chinese characters. Finally, the two parts of Chinese characters are combined, and criteria are prepared to evaluate the translation reliability according to the translation process to realize automatic quality inspection and screening of geographical names. As the experimental results show, the method is effective in the translation process of English geographic names into Chinese. This method can be easily extended to other languages such as Arabic.

Highlights

  • The geographical name [1] is a special name given to a geographical entity [2] in a specific spatial location and is an essential geographic information element in the spatial database

  • As English is the most widely used language in the world, determining how to achieve efficient and accurate translation of English geographical names is important for enriching global geographic information resources

  • Pointwise mutual information (PMI) [24,25] refers to a method to measure the probability of the simultaneous occurrence of two random events in a given joint distribution and edge distribution under the assumption of independence, and mainly focuses on a single probability event compared with mutual information

Read more

Summary

Introduction

The geographical name [1] is a special name given to a geographical entity [2] in a specific spatial location and is an essential geographic information element in the spatial database. According to national English–Chinese translation guidelines, transliteration of the special name and free translation of the general name should be guaranteed, which ensures the accuracy and applicability of geographical names over a wide range. In the process of translation, the same category of the template is used to nest matching geographical names and split their structures completely to generate a lexical structure tree [18,19]. This tree contains two parts: the special name and the general name. The reliability of geographical name translation is measured [22] according to the index value, and the automatic quality inspection of geographical name translation is realized

Pointwise Mutual Information
Template Expression of Geographical Name
Transliteration of Special Names based on Machine Learning
Syllable Division Method based on Minimum Entropy
Automatic Evaluation of Geographical Name Translation Results
Automatic Evaluation of Translation Results
Findings
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call