Abstract

Since the dawn of the Internet, the size of textual data has been steadily growing, every single day, due to frequent usage of digital libraries, social media, and online search engines, concomitant with storage of a mountainous amount of raw text. Gleaning useful content toward generation of a credible summary is a challenging task. This work reports on the implementation of a two-level summarization document, using latent Dirichlet allocation (LDA). The proposed method is constituted of two steps: the first step involves usage of maximal marginal relevance (MMR) and (b) text rank (TR) summarization procedures. These techniques generate summaries of a document, for a given corpus with multiple topics. In the second step, LDA modeling algorithm is applied to the summaries generated by MMR and TR, in the previous step. This process generates more shortened summaries, with differentiated topics. We used customers opinion reviews on products, hotels. [Section 4.1] as input corpus. The performance of this two-level document summarization (DS) using LDA is compared with MMR and TR. The comparison results show the two-level document summarization using LDA generates better summaries Sect. 5.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call