Abstract
Human action recognition is an important and challenging task due to intra-class variation and complexity of actions which is caused by diverse style and duration in performed action. Previous works mostly concentrate on either depth or RGB data to build an understanding about the shape and movement cues in videos but fail to simultaneously utilize rich information in both channels. In this paper we study the problem of RGB-D action recognition from both RGB and depth sequences using kernel descriptors. Kernel descriptors provide an unified and elegant framework to turn pixel-level attributes into descriptive information about the performed actions in video. We show how using simple kernel descriptors over pixel attributes in video sequences achieves a great success compared to the state-of-the-art complex methods. Following the success of kernel descriptors (Bo, et al., 2010) on object recognition task, we put forward the claim that using 3D kernel descriptors could be an effective way to project the low-level features on 3D patches into a powerful structure which can effectively describe the scene. We build our system upon the 3D Gradient kernel descriptor and construct a hierarchical framework by employing efficient match kernel (EMK) (Bo, and Sminchisescu, 2009) and hierarchical kernel descriptors (HKD) as higher levels to abstract the mid-level features for classification. Through extensive experiments we demonstrate the proposed approach achieves superior performance on four standard RGB-D sequences benchmarks.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have