Abstract

The number of videos accessible on web platforms is continuously growing. Global access to the content is provided, especially in videos or audio. Unlike images, where data can be gathered from a single frame, videos require a viewer to watch the entire thing to fully understand the context, and this presents a significant challenge to information extraction. Previously, seq2seq models were used to predict the summaries of paragraphs or transcripts. But recent advancements in transfer learning bypass the accuracy obtained by the standard NLP algorithms such as text rank and seq2seq models. Also, the summarization type focused on by most of the standard models is extractive summarization. Therefore, Hugging Face Transformers are used in this study to summarize the generated transcripts as they help us to perform abstractive summarization and also give greater accuracy than previous standard NLP algorithms and models.The system proposed in this paper converts audio chunks from input videos into transcripts. The summarization model, based on natural language processing and transfer learning, is then given these transcripts. This paper presents a comparative analysis for some transformer models and selects the one which performs better than other models and incorporates it in the proposed system. The developed model accepts user-supplied video links as input and outputs a summary like description of video.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call