Abstract

This article reports the results of a preliminary analysis of translation equivalents in four languages from different language families, extracted from an on-line parallel corpus of George Orwell's Nineteen Eighty-Four. The goal of the study is to determine the degree to which translation equivalents for different meanings of a polysemous word in English are lexicalized differently across a variety of languages, and to determine whether this information can be used to structure or create a set of sense distinctions useful in natural language processing applications. A coherence index is computed that measures the tendency for different senses of the same English word to be lexicalized differently, and from this data a clustering algorithm is used to create sense hierarchies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call