Sentiment analysis is a method used to measure public opinion or the emotions of a group of people with similar interests based on their reactions to an event through text, images, videos, or audio on social media. However, such online data presents several challenges that can hinder the sentiment analysis process. These challenges stem mainly from the freedom that users have to post their content. Additionally, irrelevant opinions, often referred to as fake opinions, can also arise. The Bi-LSTM approach processes input sequences bidirectionally, allowing the model to capture information from both previous and subsequent contexts. This method is well-suited for sentiment analysis tasks due to its ability to recognize language nuances and relationships between different parts of the text. This study integrates a Bi-LSTM model with FastText word embeddings to filter out irrelevant opinions considered spam. The dataset consists of 150,351 TikTok comments taken from 100 popular videos related to tourist attractions. The experimental results show that the proposed Bi-LSTM model outperforms other models such as LSTM, CNN, GRU, MD-LSTM, and Peephole LSTM, achieving a test accuracy of 89.18%. Furthermore, when slang word translation is performed to convert slang into formal words, the Bi-LSTM model shows further improvement, with test accuracy reaching 93.10%, again surpassing the baseline models. These results demonstrate the robustness of the proposed method in handling noisy and informal language, thus improving the accuracy of sentiment analysis in the context of social media. This study provides a foundation for future research to improve sentiment analysis by addressing domain-specific challenges such as data imbalance and noise in social media data.
Read full abstract