DATM: A Novel Data Agnostic Topic Modeling Technique With Improved Effectiveness for Both Short and Long Text

Michael Bewong,Lin Liu,David Kernot,John Wondoh,Md Zahidul Islam,Jiuyong Li,Selasi Kwashie,Jixue Liu

doi:10.1109/access.2023.3262653

Michael Bewong, Lin Liu + Show 6 more

Open Access

https://doi.org/10.1109/access.2023.3262653

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2023
Citations: 1	License type: CC BY-NC-ND 4.0

Affiliation: Charles Sturt University, University of South Australia

Abstract

Topic modelling is important for tackling several data mining tasks in information retrieval. While seminal topic modelling techniques such as Latent Dirichlet Allocation (LDA) have been proposed, the ubiquity of social media and the brevity of its texts pose unique challenges for such traditional topic modelling techniques. Several extensions including <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">auxiliary aggregation</i> , <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">self aggregation</i> and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">direct learning</i> have been proposed to mitigate these challenges, however some still remain. These include a lack of consistency in the topics generated and the decline in model performance in applications involving disparate document lengths. There is a recent paradigm shift towards neural topic models, which are not suited for resource-constrained environments. This paper revisits LDA-style techniques, taking a theoretical approach to analyse the relationship between word co-occurrence and topic models. Our analysis shows that by altering the word co-occurrences within the corpus, topic discovery can be enhanced. Thus we propose a novel data transformation approach dubbed <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">DATM</i> to improve the topic discovery within a corpus. A rigorous empirical evaluation shows that DATM is not only powerful, but it can also be used in conjunction with existing benchmark techniques to significantly improve their effectiveness and their consistency by up to 2 fold.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

DATM: A Novel Data Agnostic Topic Modeling Technique With Improved Effectiveness for Both Short and Long Text

Abstract

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Investigating the Efficient Use of Word Embedding with Neural-Topic Models for Interpretable Topics from Short Texts.
Riki Murakami ... Basabi Chakraborty
Sensors | VOL. 22
Riki Murakami, et. al.Riki Murakami ... Basabi Chakraborty
23 Jan 2022
Sensors | VOL. 22

A Novel Neural Topic Model and Its Supervised Extension
Ziqiang Cao ... Yang Liu
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 29
Ziqiang Cao, et. al.Ziqiang Cao ... Yang Liu
19 Feb 2015
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 29

Copula Guided Neural Topic Modelling for Short Texts
Lihui Lin ... Yanghui Rao
-
Lihui Lin, et. al.Lihui Lin ... Yanghui Rao
25 Jul 2020
25 Jul 2020

Use of Neural Topic Models in conjunction with Word Embeddings to extract meaningful topics from short texts
Nassera Habbat ... Houda Anoun
EAI Endorsed Transactions on Internet of Things | VOL. 8
Nassera Habbat, et. al.Nassera Habbat ... Houda Anoun
30 Sep 2022
EAI Endorsed Transactions on Internet of Things | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DATM: A Novel Data Agnostic Topic Modeling Technique With Improved Effectiveness for Both Short and Long Text

Abstract

Talk to us

Similar Papers

More From: IEEE Access