A Systematic Literature Review on English and Bangla Topic Modeling

Md Basim Uddin Ahmed,Mahruba Sharmin Chowdhury,Mohammad Abdullah Al Mumin,Ananta Akash Podder

doi:10.3844/jcssp.2021.1.18

Abstract

Due to the enormous growth of information and technology, the digitized texts and data are being immensely generated. Therefore, identifying the main topics in a vast collection of documents by humans is merely impossible. Topic modeling is such a statistical framework that infers the latent and underlying topics from text documents, corpus, or electronic archives through a probabilistic approach. It is a promising field in Natural Language Processing (NLP). Though many researchers have researched this field, only a few significant research has been done for Bangla. In this literature review paper, we have followed a systematic approach for reviewing topic modeling studies published from 2003 to 2020. We have analyzed topic modeling methods from different aspects and identified the research gap between topic modeling in English and Bangla language. After analyzing these papers, we have identified several types of topic modeling techniques, such as Latent Dirichlet Allocation (LDA), Latent Semantic Analysis (LSA), Support Vector Machine (SVM), Bi-term Topic Modeling (BTM). Furthermore, this review paper also highlights the real-world applications of topic modeling. Several evaluation methods were used to evaluate these models’ performances, which we have discussed in this study. We conclude by mentioning the huge future research scopes for topic modeling in Bangla.

Highlights

Because of the rapid development of Information Technology (e.g., Internet, Social Media, Online Databases, etc.), the amount of data generated has exponentially exacerbated in recent years
Though Bangla is a very popular language in the world, there are barely any Topic Modeling techniques and studies out there to find. In this Systematic Literature Review (SLR), we provide a comprehensive view of topic modeling according to the literature and how algorithms and techniques differ between English and Bangla language
The basic idea can be described as: Documents consist of various topics, which are modeled as distributions over a vocabulary (Arora et al, 2013)

Summary

Introduction

Because of the rapid development of Information Technology (e.g., Internet, Social Media, Online Databases, etc.), the amount of data generated has exponentially exacerbated in recent years. This vast accumulation of data provides essential support for training machine learning models and easy access to search engine queries. According to the study of DOMO (a cloud-based business service system), roughly 2.5 Quintilian bytes of data are produced daily and 90% of that data in the world has been created in the last two years only (according to 2018 studies) (Al Helal and Mouhoub, 2018) It is not feasible for any person to sieve useful information from these vast amounts of data manually. A few of the topic modeling methods used in our reviewed papers are described in brief here

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Computer Science	Publication Date: Jan 1, 2021
Citations: 2	License type: cc-by

R Discovery Prime

R Discovery Prime

A Systematic Literature Review on English and Bangla Topic Modeling

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Computer Science

Lead the way for us

Similar Papers

Search for K: Assessing Five Topic-Modeling Approaches to 120,000 Canadian Articles
Qiang Fu ... Yufan Zhuang
-
Qiang Fu, et. al.Qiang Fu ... Yufan Zhuang
01 Dec 2019
Search for K: Assessing Five Topic-Modeling Approaches to 120,000 Canadian Articles
Qiang Fu ... Yufan Zhuang

Evaluating LDA and LSA for Topic Modeling in the Indonesian Natural Disaster
Muhamad Gatot Supiadin ... Arif Dwi Laksito
Indonesian Journal of Computer Science | VOL. 12
Muhamad Gatot Supiadin, et. al. Muhamad Gatot Supiadin ... Arif Dwi Laksito
30 Dec 2023
Indonesian Journal of Computer Science | VOL. 12

A Comparative Empirical Evaluation of Topic Modeling Techniques
Pooja Kherwa ... Poonam Bansal
-
Pooja Kherwa, et. al.Pooja Kherwa ... Poonam Bansal
31 Jul 2020
31 Jul 2020

Understanding Social Media Behavior in Philippines Presidential Election using Natural Language Processing
Ken Gorro ... Leodivino Lawas
-
Ken Gorro, et. al.Ken Gorro ... Leodivino Lawas
04 Nov 2022
04 Nov 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Systematic Literature Review on English and Bangla Topic Modeling

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Computer Science