Abstract
One of the most challenging tasks in computer vision is human action recognition. The recent development of depth sensors has created new opportunities in this field of research. In this paper, a novel supervised spatio-temporal kernel descriptor (SSTKDes) is proposed from RGB-depth videos to establish a discriminative and compact feature representation of actions. To enhance the descriptive and discriminative ability of the descriptor, extracted primary kernel-based features are transformed into a new space by exploiting a supervised training strategy; i.e., large margin nearest neighbor (LMNN). The LMNN highly reduces the error of a nearest neighbor classifier by minimizing the intra-class variations and maximizing the inter-class distances. Subsequently, the efficient match kernel (EMK) is used to abstract the mid-level kernel features for a more efficient classification. The proposed approach is evaluated on five public benchmark datasets. The experimental evaluations demonstrate that the proposed method achieves superior performance to the state-of-the-art methods.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have