Abstract
Social media have been a public place where people can have social interaction with each other. They can freely express their words on social media. This right is also known as freedom of speech. However, because not many people can effectively use their right to free speech, many feel they can say anything on the internet, mainly on social media. Hence, social media can be a platform to capture public sentiment toward a particular topic. Automatic sentiment analysis using machine learning can take part in capturing public sentiment. However, automatic sentiment analysis is not a straightforward task. There are several problems to be solved in the automatic sentiment analysis task. Firstly, words can be ambiguous, in particular in several languages. Secondly, the context can be different from one language to another language. Therefore, using another language model to train automatic sentiment analysis tasks in different languages is not practical. This research intends to integrate BERT(Bidirectional Encoder Representations from Transformers) and Bi-LSTM(Bidirectional Long Short-Term Memory) models for text classification since they are the two most commonly used models for text classification. The models proposed in this research focus on a specific language (i.e. Indonesian). The results demonstrate that the model trained with Bidirectional Encoder Representations from Transformers (BERT) architecture with indobertweet-base-uncased pre-trained model combined with Bi-directional Long Short Term Memory (BiLSTM) achieved the best results with 95.17%, 70.25% and 69.09% for training accuracy, validation accuracy and testing accuracy respectively, albeit the model is over-fitted.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have