Optimal distance metric function with trigram features for case based word sense disambiguation using artificial neural network

P Tamilselvi,S K Srivatsa

doi:10.1109/icoac.2011.6165190

Abstract

In general, different levels of knowledge are used for disambiguation. In this paper, only three knowledge features or sources (trigram) are used to achieve the word sense disambiguation. Case based approach is applied for the disambiguation process. Cases are nothing but the refined form of words collected from Semcor, used for deriving the sense of the ambiguous input word. All possible Part of Speech (PoS) listed in Brown Corpus are collected and grouped into seventeen groups, and each group is assigned with a constant value. Trigram features of input (ambiguous words) as well as cases are represented as vector of size 1×3. Vector values for the ambiguous word and other two neighboring words are taken out from those assigned weights based on their PoS. In this paper ten different distance metric functions are empirically analyzed for improving the accuracy performance of word disambiguation with minimal knowledge sources. Neural Network is used for extracting correct sense of the ambiguous word from the selected minimal distance cases. In this paper, a long sentence is taken to project the performance of disambiguation process. From the result, it is clear that, post-trigramed Hamming function (F9) produced appreciable disambiguation accuracy 78.57% (recognized eleven ambiguous words out of fourteen).

Full Text