Abstract

Locating human action in spatio-temporal domain among untrimmed videos is an important but challenging task. Recent works have shown that incorporating contextual information leads to a significant improvement in action recognition, but there is still no existing work taking full advantage of context for action localization. While the popular target-centered methods have achieved promising results, they fail to exploit contexts and capture temporal dynamics in actions. In this paper, we propose a principled dynamic model, called spatio-temporal context model (STCM), to simultaneously locate and recognize actions. The STCM integrates various kinds of contexts, including the temporal context that consists of the sequences before and after action as well as the spatial context in the surrounding of target. Meanwhile, a novel dynamic programming approach is introduced to accumulate evidences collected at a small set of candidates in order to detect the spatio-temporal location of action effectively and efficiently. We report encouraging results on the UCF-Sports and UCF-101. It demonstrates that the contextual information is not only helpful for action recognition, but also contributes to action localization.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call