A PRELIMINARY STUDY OF SENTIMENT ANALYSIS ON COVID-19 NEWS: LESSON LEARNED FROM DATA ACQUISITION, PRE-PROCESSING, AND DESCRIPTIVE ANALYTICS

Khairil Anwar Notodiputro,Kusman Sadik,Rahmatin Nur Amalia

doi:10.30598/barekengvol17iss4pp1901-1914

Abstract

Sentiment analysis is a method used to analyze opinions and feelings. The goal of sentiment analysis is to determine whether a document contains a positive or negative emotion. Along with the spread of Covid-19 cases, news related to Covid-19 has often become a trending topic in the mass media. Conducting sentiment analysis using all news becomes more challenging because it might take time and cost. Therefore, the sampling method is needed to obtain representative news for the analysis. Web scraping was employed to obtain the news article about Covid-19 in Indonesia. In order to select the representative news, two-step sampling was employed by using stratified and systematic random sampling. According to the topic modelling results using lambda 0.6, news articles are grouped into three topics: updating Covid-19 cases, vaccination, and government policy. In addition, based on the number of positive and negative words, news articles are grouped into news dominated by positive words, news dominated by negative words, and news with the same number of positive and negative words. Methods for representing text in numerical form have been developed. Some of them use tf-idf weighting and word embedding. It does not pay attention to word order or meaning, only based on the frequency of words both locally and globally. Furthermore, this method will form a vector size as large as the number of unique words in the document, so it is less effective when many documents are used. Meanwhile, the vector size generated from the word2vec method is not as much as the number of unique words in the corpus. In addition, word2vec considers the context of the words in the corpus.

Full Text