Abstract

Dictionary selection based video summarization (VS) algorithms, in which keyframes are considered as a dictionary to reconstruct all the video frames, have been demonstrated to be effective and efficient for video summarization. It has been noticed that the feature representation of video plays a great impact of the performance of VS. In this paper, the influence of feature representation of video frames on the performance of dictionary selection-based VS is for the first time investigated. In addition to the traditional hand-crafted features used in VS, such as color histogram, the deep features learned through deep neural networks are firstly used to represent video frames for dictionary selection-based VS. The impact of dimensionality reduction to the high-dimensional deep learning features on VS is further discussed. Experimental results on a benchmark video dataset demonstrate that deep learning features are able to achieve better performance than traditional hand-crafted features for dictionary selection-based VS. Moreover, the dimensionality of deep learning features can be reduced to decrease the computational cost without the degradation of VS performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call