Pengaruh Text Preprocessing terhadap Analisis Sentimen Komentar Masyarakat pada Media Sosial Twitter (Studi Kasus Pandemi COVID-19)

Syifa Khairunnisa,Said Al Faraby,Adiwijaya Adiwijaya

doi:10.30865/mib.v5i2.2835

Abstract

COVID-19 is a pandemic that is troubling many people. This has led to a lot of public comments on Twitter social media. The comments are used for sentiment analysis so that we know the polarity of the sentiment that appears, whether it is positive, negative, or neutral. The problem when using twitter data is that the tweet data still contains many non-standard words such as abbreviated writing due to the maximum limitation of characters that can be used in one tweet. Preprocessing is the most important initial stage in sentiment analysis when using Twitter data, because it affects the classification performance results. This study specifically discusses the preproceesing technique by performing several test scenarios for the combination of preprocessing techniques to determine which preprocessing technique produces the most optimal accuracy and its effect on sentiment analysis. Feature extraction using N-Gram and word weighting using TF-IDF. Mutual Information as a feature selection method. The classification method used is SVM because it is able to classify high-dimensional data according to the data used in this study, namely text data. The results of this study indicate that the best performance is obtained by using a combination of cleaning and stemming; and normalization of words, cleaning, and stemming with the same accuracy of 77.77%. the use of unigram results in higher accuracy compared to bigram. Mutual Information is able to reduce overfitting problems by reducing irrelevant features so that train and test accuracy is quite stable

Highlights

COVID-19 (Corona Virus Disease) adalah penyakit menular yang disebabkan oleh virus Corona yang paling baru ditemukan
Hal tersebut disebabkan karena Twitter memiliki batasan dalam penulisan dengan maksimal 140 karakter untuk sekali unggah tweet
Dan menerapkan ekstraksi fitur dengan penggabungan n-gram sehingga menghasilkan informasi yang lebih banyak agar dapat meminimalisir kesalahan klasifikasi seperti yang terjadi pada pengujian bigram

Summary

PENDAHULUAN

COVID-19 (Corona Virus Disease) adalah penyakit menular yang disebabkan oleh virus Corona yang paling baru ditemukan. Hasil pengujian menunjukkan bahwa penggunaan teknik preprocessing cleaning, case folding, dan stemming tanpa menggunakan stopword removal mampu meningkatkan akurasi sistem, dengan perolehan akurasi sebesar 94.24%. Menurut penelitian ini, preprocessing menjadi langkah yang penting dalam analisis sentimen, karena pemilihan teknik preprocessing yang tepat dapat meningkatkan kinerja klasifikasi. Dari penelitian sebelumnya tidak membahas mengenai pengaruh dari berbagai teknik preprocessing yang digunakan dan juga belum diketahui kombinasi preprocessing seperti apa yang menghasilkan kinerja analisis sentimen yang optimal. Penelitian ini akan berfokus pada penerapan berbagai teknik preprocessing, sehingga dapat diketahui pengaruh preprocessing terhadap kinerja analisis sentimen. Teknik preprocessing yang digunakan pada penelitian ini berdasarkan dari penelitian sebelumnya yaitu case folding, cleaning, stopword removal, stemming dan menambahkan teknik normalisasi kata untuk mengatasi adanya penggunaan kata-kata yang tidak baku yang sering muncul ketika menggunakan data Twitter. Dan penelitian ini juga bertujuan untuk menganalisis penggunaan fitur unigram dan bigram terhadap akurasi analisis sentimen

METODOLOGI PENELITIAN

Dataset

Preperocessing

Ekstraksi Fitur N-Gram

Seleksi Fitur Mutual Information

HASIL DAN PEMBAHASAN

Pengujian Pengaruh Preprocessing

Pengujian Pengaruh Mutual Information

Pengujian Perbandingan Unigram dan Bigram

Findings

KESIMPULAN

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: JURNAL MEDIA INFORMATIKA BUDIDARMA	Publication Date: Apr 25, 2021
Citations: 8	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Pengaruh Text Preprocessing terhadap Analisis Sentimen Komentar Masyarakat pada Media Sosial Twitter (Studi Kasus Pandemi COVID-19)

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: JURNAL MEDIA INFORMATIKA BUDIDARMA

Lead the way for us

Similar Papers

Sentiment Analysis on Twitter Social Media Regarding Covid-19 Vaccination with Naive Bayes Classifier (NBC) and Bidirectional Encoder Representations from Transformers (BERT)
Angga Riski Dwi Saputra ... Budi Prasetiyo
Recursive Journal of Informatics | VOL. 2
Angga Riski Dwi Saputra, et. al.Angga Riski Dwi Saputra ... Budi Prasetiyo
30 Sep 2024
Recursive Journal of Informatics | VOL. 2

A Feature Extraction based Improved Sentiment Analysis on Apache Spark for Real-time Twitter Data
Piyush Kanungo ... Hari Singh
Scalable Computing: Practice and Experience | VOL. 24
Piyush Kanungo, et. al.Piyush Kanungo ... Hari Singh
17 Nov 2023
Scalable Computing: Practice and Experience | VOL. 24

Text Mining - A Comparative Review of Twitter Sentiments Analysis
Sushma Patil ... Balachandran Krishnan
Recent Advances in Computer Science and Communications | VOL. 17
Sushma Patil, et. al.Sushma Patil ... Balachandran Krishnan
01 Jan 2024
Recent Advances in Computer Science and Communications | VOL. 17

ANALISIS SENTIMEN TWEET KASUS KEBOCORAN DATA PENGGUNAAN FACEBOOK OLEH CAMBRIGDE ANALYTICA
Ridho Sholehurrohman ... Igit Sabda Ilman
Jurnal Pepadun | VOL. 3
Ridho Sholehurrohman, et. al.Ridho Sholehurrohman ... Igit Sabda Ilman
01 Apr 2022
Jurnal Pepadun | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Pengaruh Text Preprocessing terhadap Analisis Sentimen Komentar Masyarakat pada Media Sosial Twitter (Studi Kasus Pandemi COVID-19)

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: JURNAL MEDIA INFORMATIKA BUDIDARMA