HNIP: Compact Deep Invariant Representations for Video Matching, Localization, and Retrieval

Jie Lin,Vijay Chandrasekhar,Yan Bai,Yihang Lou,Tiejun Huang,Ling-Yu Duan,Wen Gao,Alex Kot,Shiqi Wang

doi:10.1109/tmm.2017.2713410

Abstract

With emerging demand for large-scale video analysis, MPEG initiated the compact descriptor for video analysis (CDVA) standardization in 2014. Beyond handcrafted descriptors adopted by the current MPEG-CDVA reference model, we study the problem of deep learned global descriptors for video matching, localization, and retrieval. First, inspired by a recent invariance theory, we propose a nested invariance pooling (NIP) method to derive compact deep global descriptors from convolutional neural networks (CNNs), by progressively encoding translation, scale, and rotation invariances into the pooled descriptors. Second, our empirical studies have shown that a sequence of well designed pooling moments (e.g., max or average) may drastically impact video matching performance, which motivates us to design hybrid pooling operations via NIP (HNIP). HNIP has further improved the discriminability of deep global descriptors. Third, the technical merits and performance improvements by combining deep and handcrafted descriptors are provided to better investigate the complementary effects. We evaluate the effectiveness of HNIP within the well-established MPEG-CDVA evaluation framework. The extensive experiments have demonstrated that HNIP outperforms the state-of-the-art deep and canonical handcrafted descriptors with significant mAP gains of 5.5% and 4.7%, respectively. In particular the combination of HNIP incorporated CNN descriptors and handcrafted global descriptors has significantly boosted the performance of CDVA core techniques with comparable descriptor size.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

HNIP: Compact Deep Invariant Representations for Video Matching, Localization, and Retrieval

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia

Lead the way for us

Journal: IEEE Transactions on Multimedia	Publication Date: Sep 1, 2017
Citations: 90

Similar Papers

Compact Deep Invariant Descriptors for Video Retrieval
Yihang Lou ... Wen Gao
-
Yihang Lou, et. al.Yihang Lou ... Wen Gao
01 Apr 2017
01 Apr 2017

A practical guide to CNNs and Fisher Vectors for image instance retrieval
Vijay Chandrasekhar ... Antoine Veillard
Signal Processing | VOL. 128
Vijay Chandrasekhar, et. al.Vijay Chandrasekhar ... Antoine Veillard
22 May 2016
Signal Processing | VOL. 128

Scale and Rotation Corrected CNNs (SRC-CNNs) for Scale and Rotation Invariant Character Recognition
Swetha V C ... Deepak Mishra
-
Swetha V C, et. al.Swetha V C ... Deepak Mishra
18 Dec 2018
18 Dec 2018

Deep regional feature pooling for video matching
Yan Bai ... Tiejun Huang
-
Yan Bai, et. al.Yan Bai ... Tiejun Huang
01 Sep 2017
01 Sep 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

HNIP: Compact Deep Invariant Representations for Video Matching, Localization, and Retrieval

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia