Improved text annotation with Wikipedia entities

Christos Makris,Yannis Plegas,Evangelos Theodoridis

doi:10.1145/2480362.2480425

Abstract

Text annotation is the procedure of initially identifying, in a segment of text, a set of dominant in meaning words and later on attaching to them extra information (usually drawn from a concept ontology, implemented as a catalog) that expresses their conceptual content in the current context. Attaching additional semantic information and structure helps to represent, in a machine interpretable way, the topic of the text and is a fundamental preprocessing step to many Information Retrieval tasks like indexing, clustering, classification, text summarization and cross-referencing content on web pages, posts, tweets etc.In this paper, we deal with automatic annotation of text documents with entities of Wikipedia, the largest online knowledge base; a process that is commonly known as Wikification. Moving similarly to previous approaches the cross-reference of words in the text to Wikipedia articles is based on local compatibility between the text around the term and textual information embedded in the article. The main contribution of this paper is a set of disambiguation techniques that enhance previously published approaches by employing both the WordNet lexical database and the Wikipedia article's PageRank scores in the disambiguation process. The experimental evaluation performed depicts that the exploitation of these additional semantic information sources leads to more accurate Text Annotation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improved text annotation with Wikipedia entities

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Text Semantic Annotation: A Distributed Methodology Based on Community Coherence
Christos Makris ... Georgios Pispirigos
Algorithms | VOL. 13
Christos Makris, et. al.Christos Makris ... Georgios Pispirigos
01 Jul 2020
Algorithms | VOL. 13

What's Important in a Text? An Extensive Evaluation of Linguistic Annotations for Summarization
Markus Zopf ... Beniamin Heinzerling
-
Markus Zopf, et. al.Markus Zopf ... Beniamin Heinzerling
01 Oct 2018
01 Oct 2018

Web-AM: An Efficient Boilerplate Removal Algorithm for Web Articles
Naseer Aslam ... Muhammad Amir Mehmood
-
Naseer Aslam, et. al.Naseer Aslam ... Muhammad Amir Mehmood
01 Dec 2019
01 Dec 2019

WATS-SMS: A T5-Based French Wikipedia Abstractive Text Summarizer for SMS
Jean Louis Ebongue Kedieng Fendji ... Adam Musa Ali
Future Internet | VOL. 13
Jean Louis Ebongue Kedieng Fendji, et. al.Jean Louis Ebongue Kedieng Fendji ... Adam Musa Ali
18 Sep 2021
Future Internet | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improved text annotation with Wikipedia entities

Abstract

Talk to us

Similar Papers