Abstract

In this paper, we introduce the T-MexNeg corpus of Tweets written in Mexican Spanish. It consists of 13,704 Tweets, of which 4895 contain negation structures. We performed an analysis of negation statements embedded in the language employed on social media. This research paper aims to present the annotation guidelines along with a novel resource targeted at the negation detection task. The corpus was manually annotated with labels of negation cue, scope, and, event. We report the analysis of the inter-annotator agreement for all the components of the negation structure. This resource is freely available. Furthermore, we performed various experiments to automatically identify negation using the T-MexNeg corpus and the SFU ReviewSP-NEG for training a machine learning algorithm. By comparing two different methodologies, one based on a dictionary and the other based on the Conditional Random Fields algorithm, we found that the results of negation identification on Twitter are lower when the model is trained on the SFU ReviewSP-NEG Corpus. Therefore, this paper shows the importance of having resources built specifically to deal with social media language.

Highlights

  • Negation is a complex phenomenon of language that shows a wide range of variation, especially in what can be called Netspeak [1] or Computer-Mediated Communication (CMC) [2]—in Spanish, ‘comunicación tecleada’ [3]

  • Table summarizes the results of both dictionary-based system and Conditional Random Field (CRF)-based system on the task of negation detection on both corpora T-MexNeg (TMN) and SFU ReviewSP -NEG

  • The result is an efficient detection of negation cues, minimizing the number of false positives

Read more

Summary

Introduction

Negation is a complex phenomenon of language that shows a wide range of variation, especially in what can be called Netspeak [1] or Computer-Mediated Communication (CMC) [2]—in Spanish, ‘comunicación tecleada’ [3]. To have a more complete understanding of how negation works in the language of the Internet, it is necessary the use of corpora extracted from social media platforms that annotate and deal with this phenomenon. Twitter is a microblogging service that illustrates the features of netspeak; it is open, and its use is widely spread across all types of communities. Building and studying corpora based on Twitter is a very convenient strategy to study the traits of netspeak and, negation

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call