An integrated clustering and BERT framework for improved topic modeling.

P Sumathy,Lijimol George

doi:10.1007/s41870-023-01268-w

Abstract

Topic modelling is a machine learning technique that is extensively used in Natural Language Processing (NLP) applications to infer topics within unstructured textual data. Latent Dirichlet Allocation (LDA) is one of the most used topic modeling techniques that can automatically detect topics from a huge collection of text documents. However, the LDA-based topic models alone do not always provide promising results. Clustering is one of the effective unsupervised machine learning algorithms that are extensively used in applications including extracting information from unstructured textual data and topic modeling. A hybrid model of Bidirectional Encoder Representations from Transformers (BERT) and Latent Dirichlet Allocation (LDA) in topic modeling with clustering based on dimensionality reduction have been studied in detail. As the clustering algorithms are computationally complex, the complexity increases with the higher number of features, the PCA, t-SNE and UMAP based dimensionality reduction methods are also performed. Finally, a unified clustering-based framework using BERT and LDA is proposed as part of this study for mining a set of meaningful topics from the massive text corpora. The experiments are conducted to demonstrate the effectiveness of the cluster-informed topic modeling framework using BERT and LDA by simulating user input on benchmark datasets. The experimental results show that clustering with dimensionality reduction would help infer more coherent topics and hence this unified clustering and BERT-LDA based approach can be effectively utilized for building topic modeling applications.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An integrated clustering and BERT framework for improved topic modeling.

Abstract

Talk to us

Similar Papers

More From: International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management

Lead the way for us

Journal: International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management	Publication Date: Apr 1, 2023
Citations: 11

Similar Papers

Semantic Topic Extraction from Bangla News Corpus Using LDA and BERT-LDA
Pintu Chandra Paul ... Mohammed Moshiul Hoque
-
Pintu Chandra Paul, et. al.Pintu Chandra Paul ... Mohammed Moshiul Hoque
17 Dec 2022
17 Dec 2022

Comparing human coding to two natural language processing algorithms in aspirations of people affected by Duchenne Muscular Dystrophy
Elijah Biletch ... Carolyn E Schwartz
Journal of Methods and Measurement in the Social Sciences | VOL. 13
Elijah Biletch, et. al.Elijah Biletch ... Carolyn E Schwartz
01 Oct 2022
Journal of Methods and Measurement in the Social Sciences | VOL. 13

Bidirectional encoders to state-of-the-art: a review of BERT and its transformative impact on natural language processing
Rajesh Gupta
Информатика. Экономика. Управление - Informatics. Economics. Management | VOL. 3
Rajesh GuptaRajesh Gupta
02 Mar 2024
Информатика. Экономика. Управление - Informatics. Economics. Management | VOL. 3

A case study of using natural language processing to extract consumer insights from tweets in American cities for public health crises
Ye Wang ... Yugyung Lee
BMC Public Health | VOL. 23
Ye Wang, et. al.Ye Wang ... Yugyung Lee
24 May 2023
BMC Public Health | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An integrated clustering and BERT framework for improved topic modeling.

Abstract

Talk to us

Similar Papers

More From: International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management