Abstract

Understanding of human action and activity from video data is growing field and received rapid importance due to surveillance, security, entertainment and personal logging. In this work, a new hybrid technique is proposed for the description of human action and activity in video sequences. The unified framework endows a robust feature vector wrapping both global and local information strengthening discriminative depiction of action recognition. Initially, entropy-based texture segmentation is used for human silhouette extraction followed by construction of average energy silhouette images (AEIs). AEIs are the 2D binary projection of human silhouette frames of the video sequences, which reduces the feature vector generation time complexity. Spatial Distribution Gradients are computed at different levels of resolution of sub-images of AEI consisting overall shape variations of human silhouette during the activity. Due to scale, rotation and translation invariant properties of STIPs, the vocabulary of DoG-based STIPs are created using vector quantization which is unique for each class of the activity. Extensive experiments are conducted to validate the performance of the proposed approach on four standard benchmarks, i.e., Weizmann, KTH, Ballet Movements, Multi-view IXMAS. Promising results are obtained when compared with the similar state of the arts, demonstrating the robustness of the proposed hybrid feature vector for different types of challenges—illumination, view variations posed by the datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call