Abstract

Sentiment analysis from code-mixed texts has been gaining wide attention in the past decade from researchers and practicians from various communities motivated, among others, by the increasing popularity of social media resulted in a huge volume of code-mixed texts. Sentiment analysis is an interesting problem in Natural Language Processing with wide potential applications, among others, to understand public concerns or aspirations toward some issues. This paper presents experimentation results aim to compare performance of lexicon-based and Sentence-BERT as sentiment analysis models from code-mixed of low-resources texts as input. In this study, some code-mixed texts of Bahasa Indonesia and Javanese language are used as sample of low-resource code-mixed languages. The input dataset are first translated to English using Google Machine Translation. The Sentiwordnet and VADER are two English lexicon label datasets used in this study as basis for predicting sentiment category using lexicon-based sentiment analysis method. In addition, a pretrained Sentence-BERT model is used as classification model from the translated input text to English. In this study, the dataset is categorized into positives and negative categories. The model performance was measured using accuracy, precision, recall, and F1 score. The experimentation found that the combined Google machine translator and Sentence-BERT model achieved 83 % average accuracy, 90 % average precision, 76 % average recall, and 83 % average F1 Score.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call