Abstract

AbstractTopic modeling is the statistical model for discovering hidden topics or keywords in a collection of documents. Topic modeling is also considered a probabilistic model for learning, analyzing, and discovering topics from the document collection. The most popular techniques for topic modeling are latent semantic analysis (LSA), probabilistic latent semantic analysis (pLSA), latent Dirichlet allocation (LDA), and the recent deep learning-based lda2vec. LDA is most commonly used in extractive multi-document summarization to determine whether the extracted sentence reflects the concept of the input document. In this paper, we will try to explore various multi-document summarization techniques that use LDA as a topic modeling method for improving final summary coverage and to reduce redundancy. Finally, we compared LDA and LSA using the Genism toolkit, and our experiment results show that LDA outperforms LSA if we increase the number of features considered for sentence selection.KeywordsTopic modelingText summarizationLDALSA

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call