Abstract
This paper presents a novel framework for human action recognition based on a newly proposed mid-level feature representation method named Lie Algebrized Guassians (LAG). As an action sequence can be treated as a 3D object in space-time space, we address the action recognition problem by recognizing 3D objects and characterize 3D objects by the probability distributions of local spatio-temporal features. First, for each video, we densely sample local spatio-temporal features (e.g. HOG3D) at multiple scales confined in bounding boxes of human body. Moreover, normalized spatial coordinates are appended to local descriptor in order to capture spatial position information. Then the distribution of local features in each video is modeled by a Gaussian Mixture Model (GMM). To estimate the parameters of video-specific GMMs, a global GMM is trained using all training data and video-specific GMMs are adapted from the global GMM. Then the LAG is adopted to vectorize those video-specific GMMs. Finally, linear SVM is employed for classification. Experimental results on the KTH and UCF Sports dataset show that our method achieves state-of-the-art performance.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.