TOPIC MODELING USING THE LATENT DIRICHLET ALLOCATION METHOD ON WIKIPEDIA PANDEMIC COVID-19 DATA IN INDONESIA

Hindriyanto Dwi Purnomo,Wilujeng Ayu Nawang Sari

doi:10.20884/1.jutif.2022.3.5.321

Hindriyanto Dwi Purnomo, Wilujeng Ayu Nawang Sari

Open Access

https://doi.org/10.20884/1.jutif.2022.3.5.321

Copy DOI

Abstract

Wikipedia is a web-based encyclopedia that is used to search for information. In one of the Wikipedia articles, a problem has been found regarding no one has clustered on the topic of the Covid-19 pandemic in Indonesia. The method used for this research is the Latent Dirichlet Allocation (LDA) method. The Latent Dirichlet Allocation (LDA) method is the most widely used topic modeling method today. In this study using 6658 words in English that will be used for the dataset. Then every word that appears will be counted using Corpus. This study applies topic modeling using the Latent Dirichlet Allocation (LDA) model and how to analyze COVID-19 data taken from Wikipedia. The LDA method will cluster by looking at the number of words that appear in Corpus and will determine the number of clusters and the number of topics and determine the iteration. The purpose of this study is to classify the information contained in the Wikipedia Article so that it can be used as an evaluation material in improving services and handling Wikipedia using the latent direchlet allocation method. The LDA method will mark every word contained in the topic in a semi-random distribution and will calculate the probability of the topic in the dataset and will calculate the probability of the word on the topic of each iteration. In this study, 5 iteration tests were conducted on topic modeling and a number of different topics. After the experiment is carried out, the final results obtained will be analyzed and get 1 number of topics with the best results with the most discussion topics regarding health.

Full Text