Abstract
We propose a novel approach to model spatio-temporal distribution of local features for action recognition in videos. The proposed approach is based on the Lie Algebrized Gaussians (LAG) which is a feature aggregation approach and yields high-dimensional video signature. In the framework of LAG, local features extracted from a video are aggregated to train a video-specific Gaussian Mixture Model (GMM). Then the video-specific GMM is encoded as a vector based on Lie group theory and this step is also referred to as GMM vectorization. As the video-specific GMM gives a soft partition of the feature space, for each cell of the feature space (i.e. each Gaussian component), we use a GMM to model the spatio-temporal locations of the local features assigned to the Gaussian component. The location GMMs are encoded as vectors just like the local feature GMM. We term those vectors of location GMMs spatio-temporal LAG (STLAG). In addition, although the LAG and the popular Fisher Vector (FV) are derived from distinct theory perspectives, we find that they are closely related. Hence the power and l2 normalization proposed for the FV are also beneficial to the LAG. Experimental results show that STLAG is very effective to model spatio-temporal layout compared with other techniques such as spatio-temporal pyramid and feature augmentation. Using the state-of-the-art dense trajectory features, our approach achieves state-of-the-art performance on two challenging datasets: Hollywood2 and HMDB51.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.