Abstract

Using semantic technologies for mining and intelligent information access to microblogs is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, context-dependent, and dynamic nature. Semantic annotation of tweets is typically performed in a pipeline, comprising successive stages of language identification, tokenisation, part-of-speech tagging, named entity recognition and entity disambiguation (e.g. with respect to DBpedia). Consequently, errors are cumulative, and earlier-stage problems can severely reduce the performance of final stages. This paper presents a characterisation of genre-specific problems at each semantic annotation stage and the impact on subsequent stages. Critically, we evaluate impact on two high-level semantic annotation tasks: named entity detection and disambiguation. Our results demonstrate the importance of making approaches specific to the genre, and indicate a diminishing returns effect that reduces the effectiveness of complex text normalisation.

Highlights

  • Semantic annotation is the process of tying machine tractable semantic models to natural language text

  • Semantic annotation is about annotating in texts all mentions of concepts from the ontology, through metadata referring to their URIs

  • Reliable semantic annotation of user-generated content is an enabler for other semantic technologies [4], including opinion mining [28], summarisation [38], semantic-based search, recommendation, visual analytics, and user and community modelling [41]

Read more

Summary

Introduction

Semantic annotation is the process of tying machine tractable semantic models to natural language text. In recent years, social media – and microblogging in particular – have established themselves as high-value, high-volume content, which organisations increasingly wish to analyse automatically. Reliable semantic annotation of user-generated content is an enabler for other semantic technologies [4], including opinion mining [28], summarisation [38], semantic-based search, recommendation, visual analytics, and user and community modelling [41]. It is relevant in many application contexts [12], including knowledge management, competitor intelligence, customer relation management, eBusiness, eScience, eHealth, and eGovernment

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call