Word Sense Disambiguation using Aggregated Similarity based on WordNet Graph Representation

Madalina Zurini

doi:10.12948/issn14531305/17.3.2013.15

Abstract

The term of word sense disambiguation, WSD, is introduced in the context of text document processing. A knowledge based approach is conducted using WordNet lexical ontology, describing its structure and components used for the process of identification of context related senses of each polysemy words. The principal distance measures using the graph associated to WordNet are presented, analyzing their advantages and disadvantages. A general model for aggregation of distances and probabilities is proposed and implemented in an application in order to detect the context senses of each word. For the non-existing words from WordNet, similarity measure is used based on probabilities of co-occurrences. The module of WSD is proposed for integration in the step of processing documents such as supervised and unsupervised classification in order to maximize the correctness of the classification. Future work is related to the implementation of different domain oriented ontologies.Keywords: WSD, Similarity Measure, WordNet, Ontology, Synset(ProQuest: ... denotes formulae omitted.)1 IntroductionFor the acquisition of knowledge in artificial intelligence, two approaches defined in [1] are used:* transfer process between human to knowledge base, process with major disadvantage given by the fact that the one who has knowledge cannot easily identify it;* conceptual modeling process by building models in which are placed the new knowledge as they are acquired, this process leading to the appearance of the ontology as systematic organization of knowledge, data of the reality, leading to the construction of theories upon what it exists.An essential role of ontology is to be reused in multiple applications. Mapping two or more ontologies is called alignment. This task is particularly difficult, the main cause of limitation in extending existing ontologies [1].Direction that follows the ontology is supported by the introduction of artificial intelligence techniques to emulate the mental representation of concepts used, and the interpenetration of these links.The kernel of the ontology is defined as system 0 = (£, T, C*,dC, ROOT), where:* £ is the lexicon formed out of the terms from the natural language;* C* set of concepts;* T represents the reference function that maps the set of terms of the lexicon to the set of concepts;* H is the hierarchy of the taxonomy given by the direct, acyclic, transitive and reflexive relation;* ROOT is the starting point upon which the hierarchy is built on.There are two types of ontologies as defined in [1], depending on the area in which they are used:* ontologies for knowledge-based systems are characterized by relatively small number of concepts, but linked by large and varied relationships, concepts are grouped into complex conceptual schemes or scenarios and for each concept there can be one or more customizations;* lexicalized ontologies, including large number of concepts linked by small number of relationships, like WordNet ontology concepts that are represented by sets of synonymous words, these ontologies are used in human language processing systems.It is introduced the concept of ontology as knowledge base in the classification of documents, in order to analyze semantic documents by solving the ambiguity of the terms.This integration results in an improvement in the objective function defined for classification techniques used. The main components of an ontology are described, the concepts and relations between them. These components are analyzed, identifying methods of extracting knowledge from within.With the defined relationships between concepts it is created the graph representation seen as taxonomy of belonging such as is- a of the concepts to the more general ones. The senses of concept are defined, along with the possibility of graph representation of each sense. …

Full Text