Abstract

Due to the importance of visual similarity for near-duplicate video detection, visual features are the primary features for video fingerprint generation. However, the mutual assistance between visual features and the discriminatory power of the semantic features of videos have not been well explored in video fingerprinting. To address these issues, two layers of video fingerprints are proposed in this paper. We first generate a Low-level Representation Fingerprint (LRF) from handcrafted visual features using a tensor-based model, which can well explore the mutual relations among the multiple visual features. Next, we use a Convolutional Neural Networks model to learn deep semantic features to generate a Deep Representation Fingerprint (DRF) to provide heterogeneity assistance to the LRF. As a result, both the mutual relations among multiple handcrafted visual features and the assistance from semantic feature are used in the video fingerprinting system. During the matching stage, a DRF matching followed by a LRF matching is performed. Experimental results show that the proposed method provides a superior performance compared to approaches that use the techniques individually.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call