Abstract

AbstractThe emergence of knowledge repositories in a variety of domains provides a valuable opportunity for semantic interpretation of high dimensional datasets. Previous researches investigate the use of concept instead of word as a core semantic feature for incorporating semantic knowledge from an ontology into the representation model of documents. On the other hand, in machine learning and information retrieval, data objects are represented as a flat feature vector. The inconsistency between the structural nature of the knowledge repositories and the flat representation of features in machine learning leads researchers to neglect the structure of the knowledge base and leverage concepts as isolated semantic features, which is known as bag-of-concepts. Although, using concepts has some advantages over words, by neglecting the relation between concepts, the problem of vocabulary mismatch remains in force. In this paper, a novel semantic kernel is proposed which is capable of incorporating the relatedness between conceptual features. This kernel leverages clique theory to map data objects to a novel feature space wherein complex data objects will be comparable. The proposed kernel is relevant to all applications which have a prior knowledge about the relatedness between features. We concentrate on representing text documents and words using Wikipedia and WordNet, respectively. The experimental results over a set of benchmark datasets have revealed that the proposed kernel significantly improves the representation of both words and texts in the application of semantic relatedness.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call