Abstract

Now a days, Video Summarization is highly explored research area by the research community, which has off late gained a lot of interest from researchers globally. It comprises extraction of important frames from the big size video input in order to extract the important events from the video into a much smaller yet comprehensively summarized form. Towards effective video summarization, a novel methodology is proposed in this paper, that works on AoA (attention over attention) strategy. The proposed deep model is based on multiple attention modules including spatial, channel, and multi-headed attention. The AoA enables us to capture inter spatial and inter channel relationship between the features effectively. The proposed attention module is applied over the set of existing features from video summarization datasets. Progressively applying attention ensures to highlight the most important contents from the input frames, thereby producing more effective key frames. Several ablation studies have also been performed to analyze the position spatial and channel attention to determine the best possible architecture based on experiments alongside the theoretical proofs of AoA architecture. For the experimental work, two benchmark dataset have been used and compared the performance with existing methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call