Abstract

Gestures have long been recognized as an interaction technique that can provide a more natural, creative, and intuitive way to communicate with computers. However, some existing difficulties include the high probability that the same type of movement done at different speeds will be recognized as a different category of movement; cluttered, occluded, and low-resolution backgrounds; and the near-impossibility of fusing different types of features. To this end, we propose a novel framework for integrating different scales of RGB and motion skeletons to obtain higher recognition accuracy using multiple features. Specifically, we provide a network architecture that combines a three-dimensional convolutional neural network (3DCNN) and post-fusion to better embed different features. Also, we combine RGB and motion skeleton information at different scales to mitigate speed and background issues. Experiments on several gesture recognition public datasets show desirable results, validating the superiority of the proposed gesture recognition method. Finally, we do a human-computer interaction experiment to prove its practicality.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call