An Embedding-Based Topic Model for Document Classification

Sattar Seifollahi,Massimo Piccardi,Alireza Jolfaei

doi:10.1145/3431728

Abstract

Topic modeling is an unsupervised learning task that discovers the hidden topics in a collection of documents. In turn, the discovered topics can be used for summarizing, organizing, and understanding the documents in the collection. Most of the existing techniques for topic modeling are derivatives of the Latent Dirichlet Allocation which uses a bag-of-word assumption for the documents. However, bag-of-words models completely dismiss the relationships between the words. For this reason, this article presents a two-stage algorithm for topic modelling that leverages word embeddings and word co-occurrence. In the first stage, we determine the topic-word distributions by soft-clustering a random set of embedded n -grams from the documents. In the second stage, we determine the document-topic distributions by sampling the topics of each document from the topic-word distributions. This approach leverages the distributional properties of word embeddings instead of using the bag-of-words assumption. Experimental results on various data sets from an Australian compensation organization show the remarkable comparative effectiveness of the proposed algorithm in a task of document classification.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An Embedding-Based Topic Model for Document Classification

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing

Lead the way for us

Journal: ACM Transactions on Asian and Low-Resource Language Information Processing	Publication Date: May 5, 2021
Citations: 9

Similar Papers

Short Text Topic Model with Word Embeddings and Context Information
Xianchao Zhang ... Ran Feng
-
Xianchao Zhang, et. al.Xianchao Zhang ... Ran Feng
27 Jun 2018
27 Jun 2018

Collaboratively Modeling and Embedding of Latent Topics for Short Texts
Zheng Liu ... Yun Li
IEEE Access | VOL. 8
Zheng Liu, et. al.Zheng Liu ... Yun Li
01 Jan 2020
IEEE Access | VOL. 8

Weakly supervised topic sentiment joint model with word embeddings
Xianghua Fu ... Joshua Zhexue Huang
Knowledge-Based Systems | VOL. 147
Xianghua Fu, et. al.Xianghua Fu ... Joshua Zhexue Huang
09 Feb 2018
Knowledge-Based Systems | VOL. 147

Word Embeddings for Natural Language Processing

-

01 Jan 2015
01 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Embedding-Based Topic Model for Document Classification

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing