One-shot learning gesture recognition based on improved 3D SMoSIFT feature descriptor from RGB-D videos

Jia Lin,Naigong Yu,Xiaogang Ruan,Ruoyan Wei

doi:10.1109/ccdc.2015.7162803

Abstract

To satisfy the distinctive feature extraction requirement of one-shot learning gesture recognition for mobile robot control, a improved three-dimensional local sparse motion scale invariant feature transform (3D SMoSIFT) feature descriptor is proposed, which fuses RGB-D videos. Firstly, gray pyramid, depth pyramid and optical flow pyramids are built as scale space for each gray frame (converted from RGB frame) and depth frame. Then interest regions are extracted according the variance of optical flow, and variance is calculated in horizontal and vertical direction. Subsequently, corners are just extracted in each interest region as interest points, and then the information of gray and depth optical flow is simultaneously used to detect robust keypoints around the motion pattern in the scale space. Finally, SIFT descriptors are calculated on 3D gradient space and 3D motion space. The improved feature descriptor has been evaluated under a bag of feature model on one-shot learning Chalearn Gesture Dataset. Experiments demonstrate that the proposed method distinctly improves the accuracy of gesture recognition. The results also show that the improved 3D SMoSIFT feature descriptor surpasses other spatiotemporal feature descriptors and is comparable to the state-of-the-art approaches.

Full Text