In those traditional organization name translation methods, researchers usually assumed that for every organization name to be translated, its correct translation would exist somewhere on the web. And some researchers further assumed that both the organization names to be translated and their correct translations would exist somewhere on some mix-language web pages. Thus these researchers think it is appropriate to translate organization names with some web mining based methods. However, the correctness of these assumptions has never been verified. In this paper, we focus on this issue and experimentally verify the correctness of these two assumptions. And from our experimental results, we find out several useful distribution characteristics of the organization names that appears on the web. Based on these distribution characteristics, a practical Chinese-English organization name translation method is proposed. Experimental results show that our method is effective. It can improve the inclusion rate of correct translations for those Chinese organization names whose correct translations really exist on the web, and it can also improve the BLEU score and accuracy for those Chinese organization names whose correct translations rarely occur on the web.
Read full abstract