Abstract

In this paper, we present a new method for video action recognition. The main contributions are two-fold. First, we propose local coordinates contained descriptors (LCCD) instead of appearance-only descriptors. We encode global geometric correspondence by combining descriptors with spatio-temporal locations, which is different from previous methods such as spatio-temporal pyramid matching (STPM). Spatio-temporal location is taken as part of the coding step by utilizing LCCD. Second, a novel non-negative low rank and sparse coding model is developed to encode descriptors for action recognition. Motivated by low rank matrix recovery and completion, local descriptors in a spatio-temporal neighborhood are similar and should be approximately low rank. The objective function is obtained by seeking non-negative low rank and sparse coefficients for local descriptors. The learned coefficients can capture location information and the structure of descriptors, hence improve the discriminability of representations. Experiments validate that our method achieves the state-of-the-art results on two benchmark datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call