Abstract

This paper draws on graph theory and optimization techniques to develop a new measure of word ambiguity (e.g., homonymy and polysemy) for use in psycholinguistic research. This measure provides information regarding the uncertainty of the intended meaning of English words. Specifically, data about fifty thousand distinct words was collected from a corpus of close to six hundred million words. These data are used to generate information about word association which forms a basis for the creation of semantic graphs from which clusters are created and analyzed. The clusters identify groups of words related to the different meanings of a word and are used to calculate a set of relative probabilities for the meanings. These are in turn used to calculate the information entropy for the word, which acts as a surrogate measure of ambiguity. A genetic algorithm is used to optimally determine parameters for our formula for word association and for the graph clustering algorithm. The effectiveness of this application is demonstrated with examples from psycholinguistic research.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call