Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Sentiment Analysis of Indonesian YouTube Reviews About Lesbian, Gay, Bisexual, and Transgender (LGBT) using IndoBERT Fine Tuning

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Lesbian, gay, Bisexual, and Transgender (LGBT) is an individual who has a sexual orientation or gender identity that is different from the heterosexual majority. The LGBT community now dares to appear openly on social media; nowadays, social media is used as a source of information and a place to provide comments. The Indonesian state generally still views the LGBT community as deviant behavior. This research was conducted to understand Indonesian society's views on LGBT through YouTube and social media. The text mining method analyzes and classifies the counter or pro sentences expressed in the comments. The model used in this research is IndoBERT because the research object studied is Indonesian. IndoBERT is part of the Bidirectional Encoder Representation From Transformers (BERT) model. The data sources used were 1,493 data. The stages carried out in this research included the preprocessing stage, which included case folding, data cleaning, tokenization, stopword removal, stemming, and normalization, then the data labeling stage, and finally, the model building stage with IndoBERT Fine Tuning. The level of accuracy achieved using IndoBERT is 74%.

Similar Papers
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.24843//lkjiti.2024.v15.i01.p03
Sentiment Analysis of Indonesian Youtube Reviews About Lesbian, Guy, Bisexual and Transgender (LGBT) using IndoBERT Fine Tuning
  • Mar 26, 2024
  • Lontar Komputer : Jurnal Ilmiah Teknologi Informasi
  • Teddy Oswari + 4 more

Lesbian, gay, Bisexual, and Transgender (LGBT) is an individual who has a sexual orientation or gender identity that is different from the heterosexual majority. The LGBT community now dares to appear openly on social media; nowadays, social media is used as a source of information and a place to provide comments. The Indonesian state generally still views the LGBT community as deviant behavior. This research was conducted to understand Indonesian society's views on LGBT through YouTube and social media. The text mining method analyzes and classifies the counter or pro sentences expressed in the comments. The model used in this research is IndoBERT because the research object studied is Indonesian. IndoBERT is part of the Bidirectional Encoder Representation From Transformers (BERT) model. The data sources used were 1,493 data. The stages carried out in this research included the preprocessing stage, which included case folding, data cleaning, tokenization, stopword removal, stemming, and normalization, then the data labeling stage, and finally, the model building stage with IndoBERT Fine Tuning. The level of accuracy achieved using IndoBERT is 74%. 

  • Research Article
  • Cite Count Icon 1
  • 10.25126/jtiik.2024119096
Analisis Perbandingan Model Bert Dan Xlnet Untuk Klasifikasi Tweet Bully Pada Twitter
  • Dec 10, 2024
  • Jurnal Teknologi Informasi dan Ilmu Komputer
  • Teuku Radillah + 2 more

Fenomena bullying di media sosial, khususnya di Twitter, telah menjadi isu yang semakin memprihatinkan dengan dampak signifikan terhadap kesehatan mental pengguna. Dalam rangka mengatasi masalah ini, deteksi otomatis tweet yang mengandung konten bullying menjadi sangat penting. Penelitian ini bertujuan untuk membandingkan performa dua model pemrosesan bahasa alami terbaru, yaitu BERT (Bidirectional Encoder Representations from Transformers) dan XLNet, dalam klasifikasi tweet yang mengandung bullying. Metodologi penelitian ini melibatkan pengumpulan dataset tweet yang telah dilabeli sebagai bullying atau non-bullying. Proses preprocessing teks dilakukan untuk membersihkan dan menyiapkan data sebelum digunakan dalam pelatihan model. Kedua model, BERT dan XLNet, dilatih dan diuji menggunakan dataset yang sama. Evaluasi performa dilakukan dengan menggunakan metrik akurasi, presisi, recall, dan F1-score. Hasil penelitian menunjukkan bahwa kedua model memiliki kemampuan yang baik dalam mengidentifikasi tweet bullying, akan tetapi XLNet menunjukkan performa yang lebih unggul dibandingkan BERT dengan tingkat akurasi sebesar 95%. Dengan nilai presisi = 100%, recall = 0,87%, dan F1-score = 0,88%. XLNet mampu menangkap konteks dan nuansa bahasa yang lebih kompleks dalam tweet, yang berkontribusi pada akurasi klasifikasi yang lebih tinggi. Penelitian ini memberikan kontribusi penting dalam bidang deteksi bullying di media sosial dengan menunjukkan bahwa penggunaan model XLNet lebih efektif dibandingkan BERT. Temuan ini dapat membantu platform seperti Twitter dalam mengidentifikasi dan mencegah konten bullying, sehingga menciptakan lingkungan online yang lebih aman bagi pengguna, serta dapat digunakan sebagai dasar untuk pengembangan sistem deteksi bullying yang lebih canggih dan efisien di masa depan. Abstract The phenomenon of bullying on social media, particularly on Twitter, has become an increasingly concerning issue with significant impacts on users' mental health. In order to address this issue, automatic detection of tweets containing bullying content is crucial. This study aims to compare the performance of two recent natural language processing models, namely BERT (Bidirectional Encoder Representations from Transformers) and XLNet, in the classification of tweets containing bullying. The research methodology involves collecting a dataset of tweets that have been labelled as bullying or non-bullying. Text preprocessing is done to clean and prepare the data before it is used in model training. Both models, BERT and XLNet, were trained and tested using the same dataset. Performance evaluation was conducted using accuracy, precision, recall, and F1-score metrics. The results show that both models have a good ability to identify bullying tweets, but XLNet shows superior performance compared to BERT with an accuracy rate of 95%. With precision = 100%, recall = 0.87%, and F1-score = 0.88%. XLNet is able to capture more complex context and language nuances in tweets, which contributes to higher classification accuracy. This research makes an important contribution to the field of bullying detection on social media by showing that the use of the XLNet model is more effective than BERT. These findings can help platforms like Twitter identify and prevent bullying content, thereby creating a safer online environment for users, and can be used as a basis for the development of more sophisticated and efficient bullying detection systems in the future.

  • Research Article
  • Cite Count Icon 1
  • 10.21070/ijccd.v16i1.1143
Public Sentiment Analysis of the Israel-Palestine Conflict on Social Media Using BERT
  • Oct 7, 2024
  • Indonesian Journal of Cultural and Community Development
  • Syaiful Mulki Almubarok Renhoran + 1 more

General background Israel-Palestine conflict has drawn significant global attention, particularly in how it is perceived and discussed on social media platforms. Specific background understanding public sentiment surrounding such geopolitical issues is crucial for media monitoring, diplomatic efforts, and reputation management. Knowledge gap previous sentiment analysis studies often lack the ability to accurately handle multilingual and context-rich datasets, especially in analyzing neutral sentiments, which are commonly overlooked. This study aims to apply the Bidirectional Encoder Representations from Transformers (BERT) model to analyze public sentiment towards the Israel-Palestine conflict on the X platform, focusing on Indonesian users. Results Using BERT, the model achieved 93% accuracy, with a precision of 0.95, recall of 0.93, and F1-score of 0.94. The model performed well in predicting positive and negative sentiments but showed room for improvement in handling neutral sentiment. Novelty this study introduces the implementation of the BERT Transformer model for the multilingual and context-sensitive sentiment analysis of tweets, specifically addressing a high-stakes geopolitical conflict. Implications the findings demonstrate the potential for using advanced natural language processing techniques like BERT for monitoring public opinion, brand management, and detecting societal tensions on social media, offering valuable insights for stakeholders involved in conflict resolution and diplomatic strategies. Highlights: Achieved 93% accuracy in sentiment analysis using BERT on X platform. Identified strengths in predicting positive/negative sentiments, with challenges in neutral sentiment. Demonstrated BERT’s effectiveness in handling complex geopolitical social media data. Keywords: Sentiment Analysis, BERT Transformer, Israel-Palestine Conflict, Social Media, NLP

  • Research Article
  • 10.52783/jisem.v10i24s.3938
Cyber Shield: Protecting the Digital Space from Bullies
  • Mar 24, 2025
  • Journal of Information Systems Engineering and Management
  • B Vijaya Kumar

Introduction: The cyberbullying detection system combines Artificial Intelligence (AI), Natural Language Processing (NLP), and deep learning to offer a sophisticated solution for detecting toxic online interactions. In contrast to conventional keyboard-based filtering, which tends to misclassify harmless content or miss implicit bullying, this system uses BERT (Bidirectional Encoder Representations from Transformers) to better comprehend the context and meaning of text. Cyberbullying may have catastrophic impacts on the victim, including, anxiety, depression and social exclusion. The requirement for an automated and smart detection system has become imperative as social media keeps evolving into a central mode of communication Objectives: Cyber shield creates a highly precise and content sensitive cyberbullying detection system based on deep learning techniques. The system will improve social media surveillance by minimizing false positives and negatives in detecting bullying. It also targets real-time content categorization, which ensure that offending interactions are reported in real-time for intervention. The second major objective is to enhance detection of sarcasm, implicit bullying, and developing slang words that tend to outsmart conventional filtering methods. Finally, the system is scalable and flexible to ensure it can be used on other social media websites and languages. Methods: The cyberbullying identification system operates within an organized method, beginning from data preprocessing to preprocess and normalize text data. Preprocessing involves tokenization, removing stop words, and lemmatization to normalize text input. After this, the system uses BERT to make its feature extraction and context understanding. BERT is going to understand the context of a word based on its relationship with the surrounding words in some sentence. Then we further train it on labelled data that has bullying and non-bullying text samples. The model was trained on comments and some performance indicators such as accuracy, precision, recall and F1-score were used to evaluate the model. Results: This approach enhances the detection of cyberbullying beyond what traditional models can do. The context-aware processing improves the detection of some discrete and implicit form of bullying and offensive content that are incorporated in the neutral language text. Also, the real time processing abilities allows harmful content to be identified immediately and allowing timely intervention. Conclusions: Cyber shield is extremely useful for identifying inappropriate content on social media through real-time monitoring with contextual understanding. Further enhancement could include multilingual processing, it will improve detection accuracy and increase the availability of this system. We can also add the detection of multimedia content such as videos, audios…

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 16
  • 10.47813/2782-5280-2024-3-1-0311-0320
Bidirectional encoders to state-of-the-art: a review of BERT and its transformative impact on natural language processing
  • Mar 2, 2024
  • Информатика. Экономика. Управление - Informatics. Economics. Management
  • Rajesh Gupta

First developed in 2018 by Google researchers, Bidirectional Encoder Representations from Transformers (BERT) represents a breakthrough in natural language processing (NLP). BERT achieved state-of-the-art results across a range of NLP tasks while using a single transformer-based neural network architecture. This work reviews BERT's technical approach, performance when published, and significant research impact since release. We provide background on BERT's foundations like transformer encoders and transfer learning from universal language models. Core technical innovations include deeply bidirectional conditioning and a masked language modeling objective during BERT's unsupervised pretraining phase. For evaluation, BERT was fine-tuned and tested on eleven NLP tasks ranging from question answering to sentiment analysis via the GLUE benchmark, achieving new state-of-the-art results. Additionally, this work analyzes BERT's immense research influence as an accessible technique surpassing specialized models. BERT catalyzed adoption of pretraining and transfer learning for NLP. Quantitatively, over 10,000 papers have extended BERT and it is integrated widely across industry applications. Future directions based on BERT scale towards billions of parameters and multilingual representations. In summary, this work reviews the method, performance, impact and future outlook for BERT as a foundational NLP technique. We provide background on BERT's foundations like transformer encoders and transfer learning from universal language models. Core technical innovations include deeply bidirectional conditioning and a masked language modeling objective during BERT's unsupervised pretraining phase. For evaluation, BERT was fine-tuned and tested on eleven NLP tasks ranging from question answering to sentiment analysis via the GLUE benchmark, achieving new state-of-the-art results. Additionally, this work analyzes BERT's immense research influence as an accessible technique surpassing specialized models. BERT catalyzed adoption of pretraining and transfer learning for NLP. Quantitatively, over 10,000 papers have extended BERT and it is integrated widely across industry applications. Future directions based on BERT scale towards billions of parameters and multilingual representations. In summary, this work reviews the method, performance, impact and future outlook for BERT as a foundational NLP technique.

  • Research Article
  • 10.46647/ijetms.2026.v10i02.060
AN APPROACH TO SENSITIVE CONTENT MODERATIONUSING BERT ALGORITHM
  • Apr 27, 2026
  • International Journal of Engineering Technology and Management Sciences
  • Maryala Akshay + 5 more

Hate speech is an ever-increasing menace among social media and online platforms. This covers harmful and offensive language directed towards an individual or group on the basis of race, gender, religion, or other identities. The alarming spread of hate speech creates toxic environments that have a serious collateral effect on individuals, including mental wellness and online safety. Most platforms have installed automatic systems to detect and remove hate speech, but fitness is often lacking. Traditional machine learning models like LSTM (Long Short-Term Memory) have been in use, especially in hate speech detection. Although these were good models, they seem to struggle to understand deeper meaning in most of their words and sentences and specially when the given speech features sarcasm or indirect hate. We propose improved approach in our project using the BERT (Bidirectional Encoder Representations from Transformers) model-an state-of-the-art Natural Language Processing model, and unlike LSTM which processes the words in a sequence, BERT reads an entire sentence in one go and understands it both ways, thus making detection of hate speech that much more easier even in the most complex and trickiest of sentences. BERT was trained on the social media comments dataset where both hate and neutral languages used. Thus with these results, this comparison of BERT to LSTMs shows that hate speech can be identified more accurately with less error using BERT. It can find those more nuanced patterns of hate speech that traditional models usually won't pick up. Achieving online safety is therefore the main aim of this project: installing a system with a more trustworthy detection scheme specific for the detection of hate speech. BERT can help platforms in minimizing harmful content more effectively, creating a more secure digital space for users. This work underlines the essence of adopting modern AI techniques to address real-world issues and improve communication on the web.

  • Conference Article
  • Cite Count Icon 10
  • 10.1109/hnicem54116.2021.9731956
Classification of Fire Related Tweets on Twitter Using Bidirectional Encoder Representations from Transformers (BERT)
  • Nov 28, 2021
  • Jairus Mingua + 2 more

Bidirectional Encoder Representation from Transformers (BERT) is a transfer learning model approach in natural language processing (NLP). BERT has different types of pre-trained models that can pre-train a language representation to create a model that can be finetuned on specific tasks using a dataset like text classification to produce state of the art predictions. Recent studies providing the use of BERT in natural language processing have highlighted that there are no publicly available Filipino tweet datasets regarding fire reports on social media that lead to a lack of classification models. This paper aims to design and implement a system to classify Filipino tweets using different pre-trained BERT models. Upon creating a model exclusive for organizing Filipino tweets using 2,081 tweets as a dataset that contains fire-related tweets, the researchers were able to compare the accuracy of the different finetuned pre-trained BERT models. The data shows a significant difference in the accuracy of each pre-trained BERT model. The highest of which is the BERT Base Uncased WWM model with a test accuracy of 87.50% and a train loss of 0.06 during training of the dataset. The least accurate among the pre-trained BERT models is the BERT Base Cased WWM model, with a test accuracy of 76.34% and a train loss of 0.2. The result shows that BERT Base Uncased WWM model can be a reliable model in classifying fire-related tweets. The accuracy obtained by the models may vary depending on how large the dataset is.

  • Research Article
  • Cite Count Icon 3
  • 10.1016/j.procs.2023.01.444
Intelligent Identification of Hate Speeches to address the increased rate of Individual Mental Degeneration
  • Jan 1, 2023
  • Procedia Computer Science
  • Lamima Tabassum Ava + 6 more

Intelligent Identification of Hate Speeches to address the increased rate of Individual Mental Degeneration

  • Research Article
  • Cite Count Icon 91
  • 10.1016/j.compenvurbsys.2022.101824
VictimFinder: Harvesting rescue requests in disaster response from social media with BERT
  • May 17, 2022
  • Computers, Environment and Urban Systems
  • Bing Zhou + 8 more

VictimFinder: Harvesting rescue requests in disaster response from social media with BERT

  • Research Article
  • Cite Count Icon 1
  • 10.55606/isaintek.v6i1.103
Analisis Sentimen Masyarakat Pengguna Media Sosial Twitter Terhadap Motogp Mandalika Lombok Menggunakan Metode Bidirectional Encoder Representation From Transformers (BERT)
  • May 20, 2023
  • Jurnal Informasi, Sains dan Teknologi
  • Nelly Sofi + 2 more

The MotoGP One race in West Nusa Tenggara Lombok, Mandalika which was held on March 18 2022, received many responses or reactions from the public on social media, especially Twitter. There are those who agree and disagree about the holding of MotoGP in Mandalika, to find out the responses of the people who agree or disagree is needed that can process tweets data using the sentiment analysis method. The use of BERT (Bidirectional Encoder Representations from Transformers) for sentiment analysis produces a bidirectional language model that can understand the context of all words from a sentence. The dataset used goes through preprocessing stages such as case folding, data cleaning, tokenization, normalization, and removal of stopwords before sentiment analysis is carried out. This study uses several hyperparameters, namely a batch size of 32, the optimizer uses Adam with a learning rate of 3e-6 or 0.000003, and an epoch of 25. The evaluation results of the model obtain an accuracy of 55%. Precision for positive by 56%, neutral by 59%, and negative by 44%. Recall for positive is 74%, neutral is 29%, and negative is 54%. F1-score for positive is 64%, neutral is 38%, and negative is 48%.

  • Research Article
  • Cite Count Icon 1
  • 10.47738/jads.v5i4.302
Environment Sentiment Analysis of Bali Coffee Shop Visitors Using Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-trained Transformer 2 (GPT2) Model
  • Dec 1, 2024
  • Journal of Applied Data Sciences
  • Ni Putu Widya Yuniari

Bali is one of the provinces with the most abundant natural and cultural wealth in Indonesia. One commodity that supports it is coffee. Bali Coffee is not only a gastronomic identity, but also a cultural identity which makes it have added value to be developed into various business lines. One business derivative that is quite promising is a coffee shop. However, these favorable conditions also need to be maintained to ensure good quality reaches consumers. One thing that can do is analyze reviews from customers. One of the most popular methods is Sentiment Analysis. This technique allows business to analyze customer reviews on social media. It can be a feedback to maintaining and improving quality and good relationships with customers. This research aims to create a machine learning model to analyze customer reviews at several coffee shops in Bali which are divided into three labels, namely: positive, negative and neutral. The methods used are: scraping, cleaning, stopword removal, embedding, undersampling, and modeling. The algorithms used are Bidirectional Encoder Representation from Transformer (BERT) and Generative Pre-trained Transformers (GPT). The performance metrics used in this research are precision, recall, accuracy and loss. This research succeeded in creating a sentiment analysis model for coffee shop customers in Bali. The BERT model obtained an accuracy value of 78% without undersampling with a loss in the 10th iteration of 0.27. Meanwhile, the BERT model with undersampling obtained an accuracy value of 32.85% with a loss in the 10th iteration of 0.16. The GPT2 model without undersampling gets an accuracy of 78% with a loss in the 10th iteration of 0.25. Meanwhile, the GPT model with undersampling obtained an accuracy value of 32.85% with a loss in the 10th iteration of 0.15.

  • Research Article
  • Cite Count Icon 7
  • 10.1108/k-01-2024-0103
EduChatbot: Implementing educational Chatbot for assisting the teaching-learning process by NLP-based hybrid heuristic adopted deep learning framework
  • Jul 23, 2024
  • Kybernetes
  • B Maheswari + 1 more

Purpose A new Chatbot system is implemented to provide both voice-based and textual-based communication to address student queries without any delay. Initially, the input texts are gathered from the chat and then the gathered text is fed to pre-processing techniques like tokenization, stemming of words and removal of stop words. Then, the pre-processed data are given to the Natural Learning Process (NLP) for extracting the features, where the XLnet and Bidirectional Encoder Representations from Transformers (BERT) are utilized to extract the features. From these extracted features, the target-based fused feature pools are obtained. Then, the intent detection is carried out to extract the answers related to the user queries via Enhanced 1D-Convolutional Neural Networks with Long Short Term Memory (E1DCNN-LSTM) where the parameters are optimized using Position Averaging of Binary Emperor Penguin Optimizer with Colony Predation Algorithm (PA-BEPOCPA). Finally, the answers are extracted based on the intent of a particular student’s teaching materials like video, image or text. The implementation results are analyzed through different recently developed Chatbot detection models to validate the effectiveness of the newly developed model. Design/methodology/approach A smart model for the NLP is developed to help education-related institutions for an easy way of interaction between students and teachers with high prediction of accurate data for the given query. This research work aims to design a new educational Chatbot to assist the teaching-learning process with the NLP. The input data are gathered from the user through chats and given to the pre-processing stage, where tokenization, steaming of words and removal of stop words are used. The output data from the pre-processing stage is given to the feature extraction phase where XLnet and BERT are used. In this feature extraction, the optimal features are extracted using hybrid PA-BEPOCPA to maximize the correlation coefficient. The features from XLnet and features from BERT were given to target-based features fused pool to produce optimal features. Here, the best features are optimally selected using developed PA-BEPOCPA for maximizing the correlation among coefficients. The output of selected features is given to E1DCNN-LSTM for implementation of educational Chatbot with high accuracy and precision. Findings The investigation result shows that the implemented model achieves maximum accuracy of 57% more than Bidirectional long short-term memory (BiLSTM), 58% more than One Dimansional Convolutional Neural Network (1DCNN), 59% more than LSTM and 62% more than Ensemble for the given dataset. Originality/value The prediction accuracy was high in this proposed deep learning-based educational Chatbot system when compared with various baseline works.

  • Research Article
  • Cite Count Icon 8
  • 10.33166/aetic.2024.03.003
Enhancing Depressive Post Detection in Bangla: A Comparative Study of TF-IDF, BERT and FastText Embeddings
  • Jul 1, 2024
  • Annals of Emerging Technologies in Computing
  • Saad Ahmed Sazan + 2 more

Due to massive adoption of social media, detection of users’ depression through social media analytics bears significant importance, particularly for underrepresented languages, such as Bangla. This study introduces a well-grounded approach to identify depressive social media posts in Bangla, by employing advanced natural language processing techniques. The dataset used in this work, annotated by domain experts, includes both depressive and non-depressive posts, ensuring high-quality data for model training and evaluation. To address the prevalent issue of class imbalance, we utilised random oversampling for the minority class, thereby enhancing the model's ability to accurately detect depressive posts. We explored various numerical representation techniques, including Term Frequency – Inverse Document Frequency (TF-IDF), Bidirectional Encoder Representations from Transformers (BERT) embedding and FastText embedding, by integrating them with a deep learning-based Convolutional Neural Network-Bidirectional Long Short-Term Memory (CNN-BiLSTM) model. The results obtained through extensive experimentation, indicate that the BERT approach performed better the others, achieving a F1-score of 84%. This indicates that BERT, in combination with the CNN-BiLSTM architecture, effectively recognises the nuances of Bangla texts relevant to depressive contents. Comparative analysis with the existing state-of-the-art methods demonstrates that our approach with BERT embedding performs better than others in terms of evaluation metrics and the reliability of dataset annotations. Our research significantly contributes to the development of reliable tools for detecting depressive posts in the Bangla language. By highlighting the efficacy of different embedding techniques and deep learning models, this study paves the way for improved mental health monitoring through social media platforms.

  • Research Article
  • 10.15294/7h63ma50
Sentiment Analysis on Twitter Social Media Regarding Covid-19 Vaccination with Naive Bayes Classifier (NBC) and Bidirectional Encoder Representations from Transformers (BERT)
  • Sep 30, 2024
  • Recursive Journal of Informatics
  • Angga Riski Dwi Saputra + 1 more

Abstract. The Covid-19 vaccine is an important tool to stop the Covid-19 pandemic, however, there are pros and cons from the public regarding this Covid-19 vaccine. Purpose: These responses were conveyed by the public in many ways, one of which is through social media such as Twitter. Responses given by the public regarding the Covid-19 vaccination can be analyzed and categorized into responses with positive, neutral or negative sentiments. Methods: In this study, sentiment analysis was carried out regarding Covid-19 vaccination originating from Twitter using the Naïve Bayes Classifier (NBC) and Bidirectional Encoder Representations from Transformers (BERT) algorithms. The data used in this study is public tweet data regarding the Covid-19 vaccination with a total of 29,447 tweet data in English. Result: Sentiment analysis begins with data preprocessing on the dataset used for data normalization and data cleaning before classification. Then word vectorization was performed with TF-IDF and data classification was performed using the Naïve Bayes Classifier (NBC) and Bidirectional Encoder Representations from Transformers (BERT) algorithms. From the classification results, an accuracy value of 73% was obtained for the Naïve Bayes Classifier (NBC) algorithm and 83% for the Bidirectional Encoder Representations from Transformers (BERT) algorithm. Novelty: A direct comparison between classical models such as NBC and modern deep learning models such as BERT offers new insights into the advantages and disadvantages of both approaches in processing Twitter data. Additionally, this study proposes temporal sentiment analysis, which allows evaluating changes in public sentiment regarding vaccination over time. Another innovation is the implementation of a hybrid approach to data cleansing that combines traditional methods with the natural language processing capabilities of BERT, which more effectively addresses typical Twitter data issues such as slang and spelling errors. Finally, this research also expands sentiment classification to be multi-label, identifying more specific sentiment categories such as trust, fear, or doubt, which provides a deeper understanding of public opinion.

  • Research Article
  • Cite Count Icon 1
  • 10.15294/rji.v2i2.67502
Sentiment Analysis on Twitter Social Media Regarding Covid-19 Vaccination with Naive Bayes Classifier (NBC) and Bidirectional Encoder Representations from Transformers (BERT)
  • Sep 30, 2024
  • Recursive Journal of Informatics
  • Angga Riski Dwi Saputra + 1 more

Abstract. The Covid-19 vaccine is an important tool to stop the Covid-19 pandemic, however, there are pros and cons from the public regarding this Covid-19 vaccine. Purpose: These responses were conveyed by the public in many ways, one of which is through social media such as Twitter. Responses given by the public regarding the Covid-19 vaccination can be analyzed and categorized into responses with positive, neutral or negative sentiments. Methods: In this study, sentiment analysis was carried out regarding Covid-19 vaccination originating from Twitter using the Naïve Bayes Classifier (NBC) and Bidirectional Encoder Representations from Transformers (BERT) algorithms. The data used in this study is public tweet data regarding the Covid-19 vaccination with a total of 29,447 tweet data in English. Result: Sentiment analysis begins with data preprocessing on the dataset used for data normalization and data cleaning before classification. Then word vectorization was performed with TF-IDF and data classification was performed using the Naïve Bayes Classifier (NBC) and Bidirectional Encoder Representations from Transformers (BERT) algorithms. From the classification results, an accuracy value of 73% was obtained for the Naïve Bayes Classifier (NBC) algorithm and 83% for the Bidirectional Encoder Representations from Transformers (BERT) algorithm. Novelty: A direct comparison between classical models such as NBC and modern deep learning models such as BERT offers new insights into the advantages and disadvantages of both approaches in processing Twitter data. Additionally, this study proposes temporal sentiment analysis, which allows evaluating changes in public sentiment regarding vaccination over time. Another innovation is the implementation of a hybrid approach to data cleansing that combines traditional methods with the natural language processing capabilities of BERT, which more effectively addresses typical Twitter data issues such as slang and spelling errors. Finally, this research also expands sentiment classification to be multi-label, identifying more specific sentiment categories such as trust, fear, or doubt, which provides a deeper understanding of public opinion.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant