Chinese-English Bilingual Word Semantic Similarity Based on Chinese WordNet

Yangyang Wu,Siying Wu,Duansheng Chen

doi:10.17706/jsw.10.1.20-31

Abstract

Semantic similarity measurement of multilingual words is a challenging problem in data mining, information extraction, information retrieval, etc. This paper introduces an algorithm to measure the semantic similarity of Chinese-English bilingual words based on Chinese WordNet, an expansion of WordNet in Simplified Chinese. The algorithm not only measures the semantic similarity for Chinese and English words, but also measures Chinese-English cross-lingual word semantic similarity. It utilizes WordNet's hypernym / hyponym relationships between synsets and evaluates the similarity by measuring the distances between synsets, the local densities of synsets and the depths of the synsets on the entire hierarchy of WordNet. Most words have more than one meaning. Therefore, the algorithm sets up the weights of the combination pairs of the two words' synsets in an adaptive mode. Experimental results show that the similarities measured by our algorithm match with human common sense in general.

Full Text