Discovering Computer Science Research Topic Trends using Latent Dirichlet Allocation

Kartika Rizqi Nastiti,Ahmad Fathan Hidayatullah,Ahmad Rafie Pratama

doi:10.15575/join.v6i1.636

Kartika Rizqi Nastiti, Ahmad Fathan Hidayatullah + Show 1 more

Open Access

https://doi.org/10.15575/join.v6i1.636

Copy DOI

Journal: Jurnal Online Informatika	Publication Date: Jun 17, 2021
Citations: 2	License type: CC BY-NC-ND 4.0

Affiliation: Islamic University of Indonesia

Abstract

Before conducting a research project, researchers must find the trends and state of the art in their research field. However, that is not necessarily an easy job for researchers, partly due to the lack of specific tools to filter the required information by time range. This study aims to provide a solution to that problem by performing a topic modeling approach to the scraped data from Google Scholar between 2010 and 2019. We utilized Latent Dirichlet Allocation (LDA) combined with Term Frequency-Indexed Document Frequency (TF-IDF) to build topic models and employed the coherence score method to determine how many different topics there are for each year’s data. We also provided a visualization of the topic interpretation and word distribution for each topic as well as its relevance using word cloud and PyLDAvis. In the future, we expect to add more features to show the relevance and interconnections between each topic to make it even easier for researchers to use this tool in their research projects.

Full Text