Abstract

Multiclass action detection in complex scenes is a challenging problem because of cluttered backgrounds and the large intra-class variations in each type of actions. To achieve efficient and robust action detection, we characterize a video as a collection of spatio-temporal interest points, and locate actions via finding spatio-temporal video subvolumes of the highest mutual information score towards each action class. A random forest is constructed to efficiently generate discriminative votes from individual interest points, and a fast top-K subvolume search algorithm is developed to find all action instances in a single round of search. Without significantly degrading the performance, such a top-K search can be performed on down-sampled score volumes for more efficient localization. Experiments on a challenging MSR Action Dataset II validate the effectiveness of our proposed multiclass action detection method. The detection speed is several orders of magnitude faster than existing methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call