Abstract

Video summarization (VS) has attracted intense attention recently due to its enormous applications in various computer vision domains, such as video retrieval, indexing, and browsing. Traditional VS researches mostly target at the effectiveness of the VS algorithms by introducing the high quality of features and clusters for selecting representative visual elements. Due to the increased density of vision sensors network, there is a tradeoff between the processing time of the VS methods with reasonable and representative quality of the generated summaries. It is a challenging task to generate a video summary of significant importance while fulfilling the needs of Internet of Things (IoT) surveillance networks with constrained resources. This article addresses this problem by proposing a new computationally effective solution through designing a deep CNN framework with hierarchical weighted fusion for the summarization of surveillance videos captured in IoT settings. The first stage of our framework designs discriminative rich features extracted from deep CNNs for shot segmentation. Then, we employ image memorability predicted from a fine-tuned CNN model in the framework, along with aesthetic and entropy features to maintain the interestingness and diversity of the summary. Third, a hierarchical weighted fusion mechanism is proposed to produce an aggregated score for the effective computation of the extracted features. Finally, an attention curve is constituted using the aggregated score for deciding outstanding keyframes for the final video summary. Experiments are conducted using benchmark data sets for validating the importance and effectiveness of our framework, which outperforms the other state-of-the-art schemes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call