Topic Modelling Using VSM-LDA For Document Summarization

Agus Zainal Arivin,Luthfi Atikah,Novrindah Alvi Hasanah

doi:10.31937/ti.v14i2.2854

Abstract

Summarization is a process to simplify the contents of a document by eliminating elements that are considered unimportant but do not reduce the core meaning the document wants to convey. However, as is known, a document will contain more than one topic. So it is necessary to identify the topic so that the summarization process is more effective. Latent Dirichlet Allocation (LDA) is a commonly used method of identifying topics. However, when running a program on a different dataset, LDA experiences "order effects", that is, the resulting topic will be different if the train data sequence is changed. In the same document input, LDA will provide inconsistent topics resulting in low coherence values. Therefore, this paper proposes a topic modelling method using a combination of LDA and VSM (Vector Space Model) for automatic summarization. The proposed method can overcome order effects and identify document topics that are calculated based on the TF-IDF weight on VSM generated by LDA. The results of the proposed topic modeling method on the 1300 Twitter data resulted in the highest coherence value reaching 0.72. The summary results obtained Rouge 1 is 0.78, Rouge 2 is 0.67 dan Rouge L is 0.80.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Topic Modelling Using VSM-LDA For Document Summarization

Abstract

Talk to us

Similar Papers

More From: Ultimatics : Jurnal Teknik Informatika

Lead the way for us

Journal: Ultimatics : Jurnal Teknik Informatika	Publication Date: Dec 30, 2022
License type: CC BY-SA 4.0

Similar Papers

Probabilistic Document Correlation Model
...
-
, et. al. ...
15 Dec 2007
15 Dec 2007

On the Equivalence of Information Retrieval Methods for Automated Traceability Link Recovery
Rocco Oliveto ... Malcom Gethers
-
Rocco Oliveto, et. al.Rocco Oliveto ... Malcom Gethers
13 Jul 2020
13 Jul 2020

Power Series Representation Model of Text Knowledge Based on Human Concept Learning
Xiangfeng Luo ... Jun Zhang
IEEE Transactions on Systems, Man, and Cybernetics: Systems | VOL. 44
Xiangfeng Luo, et. al.Xiangfeng Luo ... Jun Zhang
01 Jan 2014
IEEE Transactions on Systems, Man, and Cybernetics: Systems | VOL. 44

Models, Inference, and Implementation for Scalable Probabilistic Models of Text

-

01 Jan 2014
01 Jan 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Topic Modelling Using VSM-LDA For Document Summarization

Abstract

Talk to us

Similar Papers

More From: Ultimatics : Jurnal Teknik Informatika