A Brief Note on DocumentSummarization

Abhya Tripathi

doi:10.37532/jceit.2020.9(3).225

Abstract

Document Summarization is a very challenging task in text mining. Summarizing a large document in concise short sentences which is a subgroup of the initial text is called as extractive summarization. There are various applications of text summarization, but here the CNN News articles are summarized to its key sentences. In this project, Topic Modeling Algorithm the Latent Dirichlet Allocation is used to generate extractive text summarization. It is used in capturing important topics from the text and later using distribution weighting mechanism sentences are fetched from the text. The model performs well on the data andfetchesthesummaryforthenewsarticle. Thishelpsinsaving timetoreadlongtextsordocuments. Document summarization is a means of deriving significant and relevant data from the document and to make a piece of comprehensive and meaningful information. In this project, an extractive summarization of large documents is carried out using documentissegmentedinalistofsentencesandappliedto the Latent Dirichlet Allocation (LDA) algorithm to extract main topics. Then using the frequency of words of those topics in sentences, key sentences are extracted having highest distribution to summarize the text. The report is structured below in following sections. The Literature Review in Section II which discusses the work of various authors towards document summarization and LDA. The Section III specifies the actual methodology implemented using LDA model and includes data processing. Empirical results in text modeling and document summarization are discussed in the segment IV. Finally, Section V bestows the conclusion and the futurescope. Summarizing these information is of great importance and a need. Document Summarization has turned into a significant research in Natural Language Processing (NLP) and Big Data arenas. The extractive summarization using topic modeling LDA algorithm successfully generates a summary ofimportant sentences from the original document. It also provides good level of topic diversity. Later on, we might want to investigate progressively target works and improve the summary generation further and utilize diverse topic modeling techniques. Likewise, we mean to assess our way to deal with various dialects. There is a future scope of generating abstractive summaries which are more human like summaries and will require heavy machine learning tools for semantic language generation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Brief Note on DocumentSummarization

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Extractive Text and Video Summarization using TF-IDF Algorithm
Ajinkya Gothankar ... Samiksha Nehe
International Journal for Research in Applied Science and Engineering Technology | VOL. 10
Ajinkya Gothankar, et. al.Ajinkya Gothankar ... Samiksha Nehe
31 Mar 2022
International Journal for Research in Applied Science and Engineering Technology | VOL. 10

An Abstractive Summarization Technique with Variable Length Keywords as per Document Diversity
Muhammad Yahya Saeed ... Muhammad Arif Shah
Computers, Materials & Continua | VOL. 66
Muhammad Yahya Saeed, et. al.Muhammad Yahya Saeed ... Muhammad Arif Shah
01 Jan 2020
Computers, Materials & Continua | VOL. 66

A topic modeled unsupervised approach to single document extractive text summarization
Ridam Srivastava ... Vineet Kumar
Knowledge-Based Systems | VOL. 246
Ridam Srivastava, et. al.Ridam Srivastava ... Vineet Kumar
01 Apr 2022
Knowledge-Based Systems | VOL. 246

Exploring Latent Dirichlet Allocation (LDA) in Topic Modeling: Theory, Applications, and Future Directions
Ugorji C Calistus ... Chukwudumebi V Egwu
NEWPORT INTERNATIONAL JOURNAL OF ENGINEERING AND PHYSICAL SCIENCES | VOL. 4
Ugorji C Calistus, et. al.Ugorji C Calistus ... Chukwudumebi V Egwu
11 Mar 2024
NEWPORT INTERNATIONAL JOURNAL OF ENGINEERING AND PHYSICAL SCIENCES | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Brief Note on DocumentSummarization

Abstract

Talk to us

Similar Papers