An unsupervised topic segmentation model incorporating word order

Shoaib Jameel,Wai Lam

doi:10.1145/2484028.2484062

Abstract

We present a new unsupervised topic discovery model for a collection of text documents. In contrast to the majority of the state-of-the-art topic models, our model does not break the document's structure such as paragraphs and sentences. In addition, it preserves word order in the document. As a result, it can generate two levels of topics of different granularity, namely, segment-topics and word-topics. In addition, it can generate n-gram words in each topic. We also develop an approximate inference scheme using Gibbs sampling method. We conduct extensive experiments using publicly available data from different collections and show that our model improves the quality of several text mining tasks such as the ability to support fine grained topics with n-gram words in the correlation graph, the ability to segment a document into topically coherent sections, document classification, and document likelihood estimation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An unsupervised topic segmentation model incorporating word order

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

The fictionality of topic modeling: Machine reading Anthony Trollope's Barsetshire series
Rachel Sagner Buurma
Big Data & Society | VOL. 2
Rachel Sagner BuurmaRachel Sagner Buurma
01 Dec 2015
Big Data & Society | VOL. 2

Deep NMF topic modeling
Jianyu Wang ... Xiao-Lei Zhang
Neurocomputing | VOL. 515
Jianyu Wang, et. al.Jianyu Wang ... Xiao-Lei Zhang
19 Oct 2022
Neurocomputing | VOL. 515

Urdu Documents Clustering with Unsupervised and Semi-Supervised Probabilistic Topic Modeling
Mubashar Mustafa ... Hussain Ghulam
Information | VOL. 11
Mubashar Mustafa, et. al.Mubashar Mustafa ... Hussain Ghulam
05 Nov 2020
Information | VOL. 11

Enriching text representation with frequent pattern mining for probabilistic topic modeling
Hyun Duk Kim ... Yue Lu
Proceedings of the American Society for Information Science and Technology | VOL. 49
Hyun Duk Kim, et. al.Hyun Duk Kim ... Yue Lu
01 Jan 2012
Proceedings of the American Society for Information Science and Technology | VOL. 49

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An unsupervised topic segmentation model incorporating word order

Abstract

Talk to us

Similar Papers