Exploring Somali Sentiment Analysis: A Resource-Light Approach for Small-scale Text Classification

Kadar Bahar,Nehad T.A Ramaha

doi:10.59287/icaens.1069

Abstract

Sentiment analysis, a fundamental task in natural language processing (NLP), plays a crucial role in understanding people's opinions and emotions expressed in textual data. While sentiment analysis has been extensively studied for major languages, under-resourced languages like Somali have received limited attention in this domain. This paper aims to address this research gap by proposing a resource light approach for sentiment analysis in Somali, which is tailored to the language's unique characteristics and limited linguistic resources. We present a methodology that combines lexicon-based methods and feature engineering techniques to effectively extract sentiment information from Somali text. A sentiment-annotated dataset was created through crowdsourcing, enabling the training and evaluation of a sentiment classification model specifically designed for Somali. Experimental results demonstrate the competitive performance of our approach compared to existing sentiment analysis techniques for under resourced languages. The findings highlight the feasibility of sentiment analysis in Somali, even with a small-scale dataset, and shed light on the implications for sentiment analysis in other under-resourced languages. This research contributes to the advancement of sentiment analysis capabilities for under resourced languages, empowering researchers and practitioners to gain insights from sentiment information in diverse linguistic contexts.

Full Text