Video-to-Text Summarization using Natural Language Processing

Prerna Mishra Prerna Mishra,Kartik Garg Kartik Garg,Naveen Rathi Naveen Rathi

doi:10.48175/ijarsct-9160

Abstract

Video summarization aims to produce a high-quality text-based summary of videos so that it can convey all the important information or the zest of the videos to users. The process of video summarization involves the conversion of video files to audio files, which are then converted to text files. This entire process is accompanied by the use of transformer architecture of Natural Language Processing. Although a lot of studies have been carried out for text summarization, we present our model, an extractive-video-summarizer, that utilizes state-of-the-art pre-trained ML models and open-source libraries at its core. The extractive-video-summarizer uses the following regime(I) Preparation of a multidisciplinary dataset of videos, (II) Extraction of audios from video files, (III)Text generation from audio files, (IV) Text summarization using extractive summarizers, (V)Entity extraction. We conducted our research primarily on two widely used languages in India - Hindi and English. To conclude, our model performs significantly well and generates tags for videos appropriately.

Full Text