Abstract

Health database oriented data analysis and processing is very valuable, and in which the word alignment plays an important role. Health database contains a lot of medical terms. The existing word alignment methods cannot perform well due to the deficiency of term dictionary. This paper proposed a method of word alignment between Chinese and Japanese for healthy database. The method is based on the generalized intersection upon the set form of the sentence-level aligned bilingual corpus. We use GI (generalized intersection) model to align words. The GI model includes an algorithm based on generalized intersection operations on word set, and uses special stop-word set to improve the recall further. The results of experiments indicate that the GI model performed well for the health database with huge amounts of medical terms, as well as the language pairs with less linguistic resource, such as Chinese and Japanese.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call