A Comprehensive Survey on Topic Modeling in Text Summarization

G Bharathi Mohan,R Prasanna Kumar

doi:10.1007/978-981-16-8721-1_22

Abstract

AbstractTopic modeling is the statistical model for discovering hidden topics or keywords in a collection of documents. Topic modeling is also considered a probabilistic model for learning, analyzing, and discovering topics from the document collection. The most popular techniques for topic modeling are latent semantic analysis (LSA), probabilistic latent semantic analysis (pLSA), latent Dirichlet allocation (LDA), and the recent deep learning-based lda2vec. LDA is most commonly used in extractive multi-document summarization to determine whether the extracted sentence reflects the concept of the input document. In this paper, we will try to explore various multi-document summarization techniques that use LDA as a topic modeling method for improving final summary coverage and to reduce redundancy. Finally, we compared LDA and LSA using the Genism toolkit, and our experiment results show that LDA outperforms LSA if we increase the number of features considered for sentence selection.KeywordsTopic modelingText summarizationLDALSA

Full Text