Abstract

Text annotation is the procedure of initially identifying, in a segment of text, a set of dominant in meaning words and later on attaching to them extra information (usually drawn from a concept ontology, implemented as a catalog) that expresses their conceptual content in the current context. Attaching additional semantic information and structure helps to represent, in a machine interpretable way, the topic of the text and is a fundamental preprocessing step to many Information Retrieval tasks like indexing, clustering, classification, text summarization and cross-referencing content on web pages, posts, tweets etc.In this paper, we deal with automatic annotation of text documents with entities of Wikipedia, the largest online knowledge base; a process that is commonly known as Wikification. Moving similarly to previous approaches the cross-reference of words in the text to Wikipedia articles is based on local compatibility between the text around the term and textual information embedded in the article. The main contribution of this paper is a set of disambiguation techniques that enhance previously published approaches by employing both the WordNet lexical database and the Wikipedia article's PageRank scores in the disambiguation process. The experimental evaluation performed depicts that the exploitation of these additional semantic information sources leads to more accurate Text Annotation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call