Abstract

While the recognition of positive/negative sentiment in text is an established task with many standard data sets and well developed methodologies, the recognition of a more nuanced affect has received less attention: there are few publicly available annotated resources and there are a number of competing emotion representation schemes with as yet no clear approach to choose between them. To address this lack, we present a series of emotion annotation studies on tweets, providing methods for comparisons between annotation methods (relative vs. absolute) and between different representation schemes. We find improved annotator agreement with a relative annotation scheme (comparisons) on a dimensional emotion model over a categorical annotation scheme on Ekman’s six basic emotions; however, when we compare inter-annotator agreement for comparisons with agreement for a rating scale annotation scheme (both with the same dimensional emotion model), we find improved inter-annotator agreement with rating scales, challenging a common belief that relative judgements are more reliable. To support these studies and as a contribution in itself, we further present a publicly available collection of 2019 tweets annotated with scores on each of four emotion dimensions: valence, arousal, dominance and surprise, following the emotion representation model identified by Fontaine et al. in 2007.

Highlights

  • IntroductionDetection of affect in online social media and other text based sources forms an important part of understanding the behaviour and choices of people, and has found widespread application in business reputation management, understanding public preferences and choices in a political setting as well as research into human behaviour [1,2]

  • Detection of affect in online social media and other text based sources forms an important part of understanding the behaviour and choices of people, and has found widespread application in business reputation management, understanding public preferences and choices in a political setting as well as research into human behaviour [1,2].Research effort in the recognition of affect in text has focussed to a large extent on recognition of positive/negative sentiment, while more nuanced emotion representation models have received relatively little attention

  • We found evidence that annotations on a 5-point scale produced greater annotation agreement than comparisons, especially when considered as ordinal annotations and converted to comparisons

Read more

Summary

Introduction

Detection of affect in online social media and other text based sources forms an important part of understanding the behaviour and choices of people, and has found widespread application in business reputation management, understanding public preferences and choices in a political setting as well as research into human behaviour [1,2]. Martinez et al [19] suggest that ranked annotations not be treated as absolute values, and instead treated as ordinal, and used, for example, to train ranking estimators Another approach is to perform relative annotations directly, such as best/worst scaling, where the highest and lowest ranked tweets are chosen from a set of four [20]. Annotator agreement for final round pairwise comparisons was similar to that for the 5-point rating scale, and when considered as ordinal annotations and converted into pairwise comparisons, agreement was noticeably better These results challenge the notion that relative human judgements are more reliable than absolute judgements. To support the comparisons of annotation methodologies and as a contribution in itself, we present a collection of 2019 tweets annotated following the four dimensional emotion representation scheme of Fontaine et al [11].

Annotator Agreement Comparisons
Annotation Difference Metrics
Relative Dimensional Annotations
Rating Scale Annotations
Annotator Agreement
Cognitive Complexity of Annotation Tasks
Data Collection and Annotation
Annotation
Pilot Study
Primary Study
Data Availability and Privacy Protection
Predictive Model
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call