Abstract

This paper presents a method involving self-organizing monolingual semantic maps that are visible and continuous representations where Chinese or Japanese words with similar meanings are placed at the same or neighboring points so that the distance between them represents the semantic similarity. We used the self-organizing map, SOM, as a self-organizing device. The words to be self-organized are defined by sets of co-occurring words collected from Chinese or Japanese newspapers, according to their grammatical relationships. The words are then coded into vectors to be forwarded to the SOM, taking into account the semantic correlation between them, which is established using a form of word-similarity computation. The self-organized monolingual semantic maps are assessed by numerical evaluations of accuracy, recall, and the F-measure, as well as by intuition, and by the comparisons with a clustering method and with multivariate statistical analysis. This paper further discusses the possibility that the method we propose can be extended to constructing Japanese–Chinese bilingual semantic maps, with the aim of providing a semantics-based approach to word alignment in Japanese–Chinese parallel corpora. We also show the effectiveness of this extended method through small-scale comparative experiments with a baseline method, where the alignment of Japanese and Chinese words is directly determined through the Euclidean distance of vectors representing the words, with a clustering method, and with multivariate statistical analysis.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.