Abstract

Second order representation, like non-local operation and bilinear pooling, has significantly outperformed the plain counterpart on a wide variety of visual tasks. However, these previous works focus on feature interactions either in spatiotemporal dimension or in channels, both of which have been ignored the joint effect of feature interactions along with different axes. We thus propose a general interaction-aware neural network that captures higher order feature interactions both in spatiotemporal and channel dimensions. In this paper, we illustrate how to implement the second and third order exemplar CNNs in a compacted way and evaluate their performance on action recognition benchmarks. Comprehensive experiments demonstrate that our method can achieve competitive or better performance than recent start-of-the-art approaches and visualization results illustrate that our scheme can generate more discriminative representations, focusing on target regions more properly.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call