Abstract
This paper presents the study of sentiment analysis for Amharic social media texts. As the number of social media users is ever-increasing, social media platforms would like to understand the latent meaning and sentiments of a text to enhance decision-making procedures. However, low-resource languages such as Amharic have received less attention due to several reasons such as lack of well-annotated datasets, unavailability of computing resources, and fewer or no expert researchers in the area. This research addresses three main research questions. We first explore the suitability of existing tools for the sentiment analysis task. Annotation tools are scarce to support large-scale annotation tasks in Amharic. Also, the existing crowdsourcing platforms do not support Amharic text annotation. Hence, we build a social-network-friendly annotation tool called ‘ASAB’ using the Telegram bot. We collect 9.4k tweets, where each tweet is annotated by three Telegram users. Moreover, we explore the suitability of machine learning approaches for Amharic sentiment analysis. The FLAIR deep learning text classifier, based on network embeddings that are computed from a distributional thesaurus, outperforms other supervised classifiers. We further investigate the challenges in building a sentiment analysis system for Amharic and we found that the widespread usage of sarcasm and figurative speech are the main issues in dealing with the problem. To advance the sentiment analysis research in Amharic and other related low-resource languages, we release the dataset, the annotation tool, source code, and models publicly under a permissive.
Highlights
Sentiment analysis is the task of detecting the orientation of someone’s opinion and analyzing the emotions, feelings, and attitudes of a speaker or a writer in a piece of information concerning a certain situation, object, or event (Pandey and Govilkar, 2015)
K-Nearest Neighbor (KNN): KNN works by determining the nearest neighbors to a given query and use those classes to predict the right class of the query (Cunningham and Delany, 2020)
We have followed the suggestions by De Souza Bermejo et al (2019) to categorize sentiment classes into ‘positive’, ‘negative‘, ‘neutral’, and ‘mixed‘
Summary
Sentiment analysis is the task of detecting the orientation of someone’s opinion and analyzing the emotions, feelings, and attitudes of a speaker or a writer in a piece of information concerning a certain situation, object, or event (Pandey and Govilkar, 2015). The most widely adopted approach in sentiment analysis to explore opinions is by employing very large datasets that target products and services, political, economical, social, and cultural feelings (Kauffmann et al, 2019; Caetano et al, 2018; Lennox et al, 2020). The absence of well-annotated corpora and NLP resources like parsers and taggers make Amharic sentiment analysis still challenging (Gezmu et al, 2018; Pandey and Govilkar, 2015)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.