Abstract
Human action recognition has gathered significant attention in recent years due to its high demand in various application domains. In this work, we propose a novel codebook generation and hybrid encoding scheme for classification of action videos. The proposed scheme develops a discriminative codebook and a hybrid feature vector by encoding the features extracted from CNNs (convolutional neural networks). We explore different CNN architectures for extracting spatio-temporal features. We employ an agglomerative clustering approach for codebook generation, which intends to combine the advantages of global and class-specific codebooks. We propose a Residual Vector of Locally Aggregated Descriptors (R-VLAD) and fuse it with locality-based coding to form a hybrid feature vector. It provides a compact representation along with high order statistics. We evaluated our work on two publicly available standard benchmark datasets HMDB-51 and UCF-101. The proposed method achieves 72.6% and 96.2% on HMDB51 and UCF101, respectively. We conclude that the proposed scheme is able to boost recognition accuracy for human action recognition.
Highlights
Human action recognition [1] is one of the active areas of research in computer vision.Action recognition systems can be used in many applications such as surveillance, human–computer interactions, content-based retrieval systems and video indexing
Most of the earlier research in action recognition focused on hand crafted features such as local space-time features [2], spatio-temporal features [3] and Motion Boundary Histograms (MBH) [4]
To combine the advantage of global representation along with class-specific representation, we introduce the idea of clustering these global and class-specific codewords. k-means clustering is applied on centroids of the global CG and the class-specific Ccs codebooks
Summary
Action recognition systems can be used in many applications such as surveillance, human–computer interactions, content-based retrieval systems and video indexing. It involves the recognition of human actions from video sequences. This task can be challenging when posed with problems such as background clutter, partial occlusion, variation of scales, appearance and lighting. An important aspect of action recognition is to find meaningful information from videos in the form of feature vectors. These feature vectors provide representations that help in discriminating different human actions. Most of the earlier research in action recognition focused on hand crafted features such as local space-time features [2], spatio-temporal features [3] and Motion Boundary Histograms (MBH) [4]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.