Extractive Text and Video Summarization using TF-IDF Algorithm

Ajinkya Gothankar,Lavish Gupta,Samiksha Nehe,Prof Monali Bansode,Niharika Bisht

doi:10.22214/ijraset.2022.40775

Abstract

Abstract: Text summarization is a technique for extracting concise summaries from a large text without sacrificing any important information. It's a good way to extract crucial information from documents. The rapid rise of the internet has resulted in a substantial surge in data all across the world. It has become difficult for humans to manually summarise big documents. Automatic Text Summarization is an NLP technique that lowers the time and efforts required by a human to create a summary. Text summarising techniques are divided into two categories: extractive and abstractive. In the extractive approach, text summarising techniques choose sentences from documents based on a set of criteria. In the abstractive approach, text summarising techniques strive to improve sentence coherence by reducing redundancies and explaining the context of sentences. The extractive summarization approach is the subject of this paper. There are several methods for summarising data, including TF-IDF, Text Rank, PageRank, and Latent Dirichlet Allocation (LDA). This work examines Text Summarization using the TFIDF Algorithm, a numerical measure that ranks the value of a word in a document based on how frequently it appears in that document and a set of documents. The application of the TF-IDF Algorithm for text, document, article, and video summarization is described in this study. There are no repetitions in the results, and for some searches, they are nearly identical to the summary results provided by humans. This algorithm offers a sentence extraction technique that selects the most diverse top-ranked sentences. Keywords: Extractive Summarization, Term Frequency-Inverse Document Frequency (TF-IDF), Natural Language Processing (NLP), Text Summarization, Video Summarization

Full Text