Multi-scale deep feature fusion based sparse dictionary selection for video summarization

Xiao Wu,Mingyang Ma,Shuai Wan,Xiuxiu Han,Shaohui Mei

doi:10.1016/j.image.2023.117006

Abstract

The explosive growth of video data constitutes a series of new challenges in computer vision, and the function of video summarization (VS) is becoming more and more prominent. Recent works have shown the effectiveness of sparse dictionary selection (SDS) based VS, which selects a representative frame set to sufficiently reconstruct a given video. Existing SDS based VS methods use conventional handcrafted features or single-scale deep features, which could diminish their summarization performance due to the underutilization of frame feature representation. Deep learning techniques based on convolutional neural networks (CNNs) exhibit powerful capabilities among various vision tasks, as the CNN provides excellent feature representation. Therefore, in this paper, a multi-scale deep feature fusion based sparse dictionary selection (MSDFF-SDS) is proposed for VS. Specifically, multi-scale features include the directly extracted features from the last fully connected layer and the global average pooling (GAP) processed features from intermediate layers, then VS is formulated as a problem of minimizing the reconstruction error using the multi-scale deep feature fusion. In our formulation, the contribution of each scale of features can be adjusted by a balance parameter, and the row-sparsity consistency of the simultaneous reconstruction coefficient is used to select as few keyframes as possible. The resulting MSDFF-SDS model is solved by using an efficient greedy pursuit algorithm. Experimental results on two benchmark datasets demonstrate that the proposed MSDFF-SDS improves the F-score of keyframe based summarization more than 3% compared with the existing SDS methods, and performs better than most deep-learning methods for skimming based summarization.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-scale deep feature fusion based sparse dictionary selection for video summarization

Abstract

Talk to us

Similar Papers

More From: Signal Processing: Image Communication

Lead the way for us

Journal: Signal Processing: Image Communication	Publication Date: Jul 11, 2023
Citations: 1

Similar Papers

Multi-Scale Feature Fusion for Coal-Rock Recognition Based on Completed Local Binary Pattern and Convolution Neural Network.
Xiaoyang Liu ... Mingxuan Zhou
Entropy | VOL. 21
Xiaoyang Liu, et. al.Xiaoyang Liu ... Mingxuan Zhou
25 Jun 2019
Entropy | VOL. 21

Many heads are better than one: A multiscale neural information feature fusion framework for spatial route selections decoding from multichannel neural recordings of pigeons
Mengmeng Li ... Hong Wan
Brain Research Bulletin | VOL. 184
Mengmeng Li, et. al.Mengmeng Li ... Hong Wan
12 Mar 2022
Brain Research Bulletin | VOL. 184

A new ordered pooling network based on multi-scale fusion feature for medical image recognition
Kui Qian ... Lei Tian
-
Kui Qian, et. al.Kui Qian ... Lei Tian
26 Jul 2021
26 Jul 2021

LiM-Net: Lightweight multi-level multiscale network with deep residual learning for automatic liver segmentation in CT images
Devidas T Kushnure ... Sanjay N Talbar
Biomedical Signal Processing and Control | VOL. 80
Devidas T Kushnure, et. al.Devidas T Kushnure ... Sanjay N Talbar
21 Oct 2022
Biomedical Signal Processing and Control | VOL. 80

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-scale deep feature fusion based sparse dictionary selection for video summarization

Abstract

Talk to us

Similar Papers

More From: Signal Processing: Image Communication