Code-mixed sentiment analysis of Indonesian language and Javanese language using Lexicon based approach

C Tho,A Wibowo,L Lukas,Y Heryadi

doi:10.1088/1742-6596/1869/1/012084

C Tho, A Wibowo + Show 2 more

Open Access

https://doi.org/10.1088/1742-6596/1869/1/012084

Copy DOI

Abstract

Nowadays mixing one language with another language either in spoken or written communication has become a common practice for bilingual speakers in daily conversation as well as in social media. Lexicon based approach is one of the approaches in extracting the sentiment analysis. This study is aimed to compare two lexicon models which are SentiNetWord and VADER in extracting the polarity of the code-mixed sentences in Indonesian language and Javanese language. 3,963 tweets were gathered from two accounts that provide code-mixed tweets. Pre-processing such as removing duplicates, translating to English, filter special characters, transform lower case and filter stop words were conducted on the tweets. Positive and negative word score from lexicon model was then calculated using simple mathematic formula in order to classify the polarity. By comparing with the manual labelling, the result showed that SentiNetWord perform better than VADER in negative sentiments. However, both of the lexicon model did not perform well in neutral and positive sentiments. On overall performance, VADER showed better performance than SentiNetWord. This study showed that the reason for the misclassified was that most of Indonesian language and Javanese language consist of words that were considered as positive in both Lexicon model.

Full Text