BATS: A Spectral Biclustering Approach to Single Document Topic Modeling and Segmentation

Qiong Wu,Zhenming Liu,Yanhua Li,Christopher G Brinton,Adam Hare,Yuwei Tu,Sirui Wang

doi:10.1145/3468268

Abstract

Existing topic modeling and text segmentation methodologies generally require large datasets for training, limiting their capabilities when only small collections of text are available. In this work, we reexamine the inter-related problems of “topic identification” and “text segmentation” for sparse document learning, when there is a single new text of interest. In developing a methodology to handle single documents, we face two major challenges. First is sparse information : with access to only one document, we cannot train traditional topic models or deep learning algorithms. Second is significant noise : a considerable portion of words in any single document will produce only noise and not help discern topics or segments. To tackle these issues, we design an unsupervised, computationally efficient methodology called Biclustering Approach to Topic modeling and Segmentation (BATS). BATS leverages three key ideas to simultaneously identify topics and segment text: (i) a new mechanism that uses word order information to reduce sample complexity, (ii) a statistically sound graph-based biclustering technique that identifies latent structures of words and sentences, and (iii) a collection of effective heuristics that remove noise words and award important words to further improve performance. Experiments on six datasets show that our approach outperforms several state-of-the-art baselines when considering topic coherence, topic diversity, segmentation, and runtime comparison metrics.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

BATS: A Spectral Biclustering Approach to Single Document Topic Modeling and Segmentation

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Intelligent Systems and Technology

Lead the way for us

Journal: ACM Transactions on Intelligent Systems and Technology	Publication Date: Oct 15, 2021
Citations: 6

Similar Papers

MIT: Mutual Information Topic Model for Diverse Topic Extraction.
Rui Wang ... Yongquan Zhou
IEEE transactions on neural networks and learning systems | VOL. PP
Rui Wang, et. al.Rui Wang ... Yongquan Zhou
01 Jan 2024
IEEE transactions on neural networks and learning systems | VOL. PP

Investigating the Efficient Use of Word Embedding with Neural-Topic Models for Interpretable Topics from Short Texts.
Riki Murakami ... Basabi Chakraborty
Sensors | VOL. 22
Riki Murakami, et. al.Riki Murakami ... Basabi Chakraborty
23 Jan 2022
Sensors | VOL. 22

Analysing political events on Twitter
Anjie Fang
ACM SIGIR Forum | VOL. 53
Anjie FangAnjie Fang
01 Jun 2019
ACM SIGIR Forum | VOL. 53

Depression, anxiety, and burnout in academia: topic modeling of PubMed abstracts
Olga Lezhnina
Frontiers in Research Metrics and Analytics | VOL. 8
Olga LezhninaOlga Lezhnina
27 Nov 2023
Frontiers in Research Metrics and Analytics | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

BATS: A Spectral Biclustering Approach to Single Document Topic Modeling and Segmentation

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Intelligent Systems and Technology