Discriminatively Learned Hierarchical Rank Pooling Networks

Basura Fernando,Stephen Gould

doi:10.1007/s11263-017-1030-x

Abstract

Rank pooling is a temporal encoding method that summarizes the dynamics of a video sequence to a single vector which has shown good results in human action recognition in prior work. In this work, we present novel temporal encoding methods for action and activity classification by extending the unsupervised rank pooling temporal encoding method in two ways. First, we present discriminative rank pooling in which the shared weights of our video representation and the parameters of the action classifiers are estimated jointly for a given training dataset of labelled vector sequences using a bilevel optimization formulation of the learning problem. When the frame level features vectors are obtained from a convolutional neural network (CNN), we rank pool the network activations and jointly estimate all parameters of the model, including CNN filters and fully-connected weights, in an end-to-end manner which we coined as end-to-end trainable rank pooled CNN. Importantly, this model can make use of any existing convolutional neural network architecture (e.g., AlexNet or VGG) without modification or introduction of additional parameters. Then, we extend rank pooling to a high capacity video representation, called hierarchical rank pooling. Hierarchical rank pooling consists of a network of rank pooling functions, which encode temporal semantics over arbitrary long video clips based on rich frame level features. By stacking non-linear feature functions and temporal sub-sequence encoders one on top of the other, we build a high capacity encoding network of the dynamic behaviour of the video. The resulting video representation is a fixed-length feature vector describing the entire video clip that can be used as input to standard machine learning classifiers. We demonstrate our approach on the task of action and activity recognition. We present a detailed analysis of our approach against competing methods and explore variants such as hierarchy depth and choice of non-linear feature function. Obtained results are comparable to state-of-the-art methods on three important activity recognition benchmarks with classification performance of 76.7% mAP on Hollywood2, 69.4% on HMDB51, and 93.6% on UCF101.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Discriminatively Learned Hierarchical Rank Pooling Networks

Abstract

Talk to us

Similar Papers

More From: International Journal of Computer Vision

Lead the way for us

Journal: International Journal of Computer Vision	Publication Date: Jun 24, 2017
Citations: 17

Similar Papers

Rank Pooling for Action Recognition.
Basura Fernando ... Jose Oramas M
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 39
Basura Fernando, et. al.Basura Fernando ... Jose Oramas M
16 May 2016
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 39

Action Recognition with Dynamic Image Networks
Hakan Bilen ... Basura Fernando
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 40
Hakan Bilen, et. al.Hakan Bilen ... Basura Fernando
02 Nov 2017
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 40

Hierarchical dynamic depth projected difference images–based action recognition in videos with convolutional neural networks
Hanbo Wu ... Xin Ma
International Journal of Advanced Robotic Systems | VOL. 16
Hanbo Wu, et. al.Hanbo Wu ... Xin Ma
01 Jan 2019
International Journal of Advanced Robotic Systems | VOL. 16

Rank pooling dynamic network: Learning end-to-end dynamic characteristic for action recognition
Zhigang Zhu ... Yiping Xu
Neurocomputing | VOL. 317
Zhigang Zhu, et. al.Zhigang Zhu ... Yiping Xu
16 Aug 2018
Neurocomputing | VOL. 317

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Discriminatively Learned Hierarchical Rank Pooling Networks

Abstract

Talk to us

Similar Papers

More From: International Journal of Computer Vision