Abstract

An unsupervised method for word-sense disambiguation using bilingual comparable corpora was developed. First, it extracts word associations, i.e., statistically significant pairs of associated words, from the corpus of each language. Then, it aligns word associations by consulting a bilingual dictionary and calculates correlation between senses of a target polysemous word and its associated words, which can be regarded as clues for identifying the sense of the target word. To overcome the problem of disparity of topical coverage between corpora of the two languages as well as the problem of ambiguity in word-association alignment, an algorithm for iteratively calculating a sense-vs.-clue correlation matrix for each target word was devised. Word-sense disambiguation for each instance of the target word is done by selecting the sense that maximizes the score, i.e., a weighted sum of the correlations between each sense and clues appearing in the context of the instance. An experiment using Wall Street Journal and Nihon Keizai Shimbun corpora together with the EDR bilingual dictionary showed that the new method has promising performance; namely, the Fmeasure of its sense selection was 74.6% compared to a baseline of 62.8%. The developed method will possibly be extended into a fully unsupervised method that features automatic division and definition of word senses.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.