A versatile framework for resource-limited sentiment articulation, annotation, and analysis of short texts.

Vuk Batanović,Boško Nikolić,Miloš Cvetanović,Luis M Rocha

doi:10.1371/journal.pone.0242050

Vuk Batanović, Boško Nikolić + Show 2 more

Open Access

https://doi.org/10.1371/journal.pone.0242050

Copy DOI

Journal: PloS one	Publication Date: Nov 12, 2020
Citations: 15	License type: CC BY 4.0

Affiliation: University of Belgrade, Centar za Promociju Nauke

Abstract

Choosing a comprehensive and cost-effective way of articulating and annotating the sentiment of a text is not a trivial task, particularly when dealing with short texts, in which sentiment can be expressed through a wide variety of linguistic and rhetorical phenomena. This problem is especially conspicuous in resource-limited settings and languages, where design options are restricted either in terms of manpower and financial means required to produce appropriate sentiment analysis resources, or in terms of available language tools, or both. In this paper, we present a versatile approach to addressing this issue, based on multiple interpretations of sentiment labels that encode information regarding the polarity, subjectivity, and ambiguity of a text, as well as the presence of sarcasm or a mixture of sentiments. We demonstrate its use on Serbian, a resource-limited language, via the creation of a main sentiment analysis dataset focused on movie comments, and two smaller datasets belonging to the movie and book domains. In addition to measuring the quality of the annotation process, we propose a novel metric to validate its cost-effectiveness. Finally, the practicality of our approach is further validated by training, evaluating, and determining the optimal configurations of several different kinds of machine-learning models on a range of sentiment classification tasks using the produced dataset.

Highlights

Sentiment analysis is one of the most popular and understandable and applicable tasks in the field of natural language processing (NLP)
The framework for short-text sentiment articulation, annotation, and analysis that we present in this paper is suitable for resource-limited settings, since its sentiment labels simultaneously encode information regarding the polarity, subjectivity, and ambiguity of a text, as well as the presence of sarcasm or a mixture of sentiments
We first transliterate all texts written in the Serbian Cyrillic script to their Latin script equivalents, since Serbian is a digraphic language

Summary

Introduction

Sentiment analysis is one of the most popular and understandable and applicable tasks in the field of natural language processing (NLP). The general term sentiment analysis encompasses several specific subtasks, including polarity detection, subjectivity detection, sarcasm detection, etc. These tasks are often conceptualized in the form of binary classification problems, where the goal is to distinguish between positive and negative texts, subjective and objective texts, sarcastic and non-sarcastic texts, etc. There have been many sentiment articulation schemes which tackle only one of these subtasks [1,2,3,4,5,6]. The binary conception of this task is extremely simple, but surprisingly useful in real-world applications, such as social

Objectives

Results

Discussion

Conclusion