Abstract

With emerging demand for large-scale video analysis, MPEG initiated the compact descriptor for video analysis (CDVA) standardization in 2014. Beyond handcrafted descriptors adopted by the current MPEG-CDVA reference model, we study the problem of deep learned global descriptors for video matching, localization, and retrieval. First, inspired by a recent invariance theory, we propose a nested invariance pooling (NIP) method to derive compact deep global descriptors from convolutional neural networks (CNNs), by progressively encoding translation, scale, and rotation invariances into the pooled descriptors. Second, our empirical studies have shown that a sequence of well designed pooling moments (e.g., max or average) may drastically impact video matching performance, which motivates us to design hybrid pooling operations via NIP (HNIP). HNIP has further improved the discriminability of deep global descriptors. Third, the technical merits and performance improvements by combining deep and handcrafted descriptors are provided to better investigate the complementary effects. We evaluate the effectiveness of HNIP within the well-established MPEG-CDVA evaluation framework. The extensive experiments have demonstrated that HNIP outperforms the state-of-the-art deep and canonical handcrafted descriptors with significant mAP gains of 5.5% and 4.7%, respectively. In particular the combination of HNIP incorporated CNN descriptors and handcrafted global descriptors has significantly boosted the performance of CDVA core techniques with comparable descriptor size.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.