Abstract

Nowadays mixing one language with another language either in spoken or written communication has become a common practice for bilingual speakers in daily conversation as well as in social media. Lexicon based approach is one of the approaches in extracting the sentiment analysis. This study is aimed to compare two lexicon models which are SentiNetWord and VADER in extracting the polarity of the code-mixed sentences in Indonesian language and Javanese language. 3,963 tweets were gathered from two accounts that provide code-mixed tweets. Pre-processing such as removing duplicates, translating to English, filter special characters, transform lower case and filter stop words were conducted on the tweets. Positive and negative word score from lexicon model was then calculated using simple mathematic formula in order to classify the polarity. By comparing with the manual labelling, the result showed that SentiNetWord perform better than VADER in negative sentiments. However, both of the lexicon model did not perform well in neutral and positive sentiments. On overall performance, VADER showed better performance than SentiNetWord. This study showed that the reason for the misclassified was that most of Indonesian language and Javanese language consist of words that were considered as positive in both Lexicon model.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.