Abstract

Gesture recognition has attracted great attention owing to its applications in many fields such as Human Computer Interaction. However, in video-based gesture recognition, some gesture-irrelevant factors like the background handicap the improvement of recognition rate. In this paper, we propose an effective 3D Convolutional Neural Network based method for large-scale gesture recognition using RGB-D video data. To obtain compact but with sufficient motion path information data for the network, the inputs are unified into 32-frame videos first. Then the optical flow images are constructed from the RGB videos frame by frame, to help with eliminating the disturbing background inside them. After that, the spatiotemporal features of de-background RGB and depth data are extracted with the C3D model (a 3D CNN model) respectively and blended together in the next stage according to the discriminant correlation analysis to boost the performance. Finally the classes are predicted with a linear SVM classifier. Our proposed method achieves 54.50% accuracy on the validation subset and 60.93% on the testing subset of the Chalearn LAP IsoGD dataset, both of which outperform our results (ranked 1st place) in the Chalearn LAP Large-scale Gesture Recognition Challenge.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call