Abstract

Since the dawn of the Internet, the size of textual data has been steadily growing, every single day, due to frequent usage of digital libraries, social media, and online search engines, concomitant with storage of a mountainous amount of raw text. Gleaning useful content toward generation of a credible summary is a challenging task. This work reports on the implementation of a two-level summarization document, using latent Dirichlet allocation (LDA). The proposed method is constituted of two steps: the first step involves usage of maximal marginal relevance (MMR) and (b) text rank (TR) summarization procedures. These techniques generate summaries of a document, for a given corpus with multiple topics. In the second step, LDA modeling algorithm is applied to the summaries generated by MMR and TR, in the previous step. This process generates more shortened summaries, with differentiated topics. We used customers opinion reviews on products, hotels. [Section 4.1] as input corpus. The performance of this two-level document summarization (DS) using LDA is compared with MMR and TR. The comparison results show the two-level document summarization using LDA generates better summaries Sect. 5.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.