Abstract

This paper presents an effective local spatial-temporal descriptor for action recognition from skeleton sequences. The unique property of our descriptor is that it takes the spatial-temporal discrimination and action speed variations into account, intending to solve the problems of distinguishing similar actions and identifying actions with different speeds in one goal. The entire algorithm consists of two stages. First, a frame selection method is used to remove noisy skeletons for a given skeleton sequence. From the selected skeletons, skeleton joints are mapped to a high dimensional space, where each point refers to kinematics, time label and joint label of a skeleton joint. To encode relative relationships among joints, pairwise points from the space are then jointly mapped to a new space, where each point encodes the relative relationships of skeleton joints. Second, Fisher Vector (FV) is employed to encode all points from the new space as a compact feature representation. To cope with speed variations in actions, an energy-based temporal pyramid is applied to form a multi-temporal FV representation, which is fed into a kernel-based extreme learning machine classifier for recognition. Extensive experiments on benchmark datasets consistently show that our method outperforms state-of-the-art approaches for skeleton-based action recognition.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call