Abstract

In today's age, a massive amount of videos are produced every day, which contains audio, visual, and textual data. This constant increase is due to the ease of recording service in portable devices, such as mobile phones, tablets, or cameras. The major challenge is to understand the visual semantics and convert it into a condensed format, such as caption or summary to save storage space, enables users to index, navigate, and help gain information in less time. We propose an innovative joint end-to-end solution, Abstractive Summarization of Video Sequences, which uses the deep neural network to generate the natural language description and abstractive text summarization of an input video. This provides a text-based video description and abstractive summary, enabling users to discriminate between relevant and irrelevant information according to their needs. Furthermore, our experiments show that the joint model can attain better results than the baseline methods in separate tasks with informative, concise, and readable multi-line video description and summary in a human evaluation.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call