Abstract

Dynamic hand skeletons consisting of discrete spatial-temporal finger joint clouds effectively convey the intentions of communicators. Previous graph convolutional networks (GCNs) relying on human hand-crafted inductive biases have been quickly promoted for skeleton-based hand gesture recognition (SHGR). However, most existing graph constructions for GCN-based solutions are set manually, only considering the physical topology of the hand skeleton, and the fixed dependencies among hand joints may lead to suboptimal models. To enrich the local dependencies, we emphasize that hand skeletons can be seen from two views: explicit joint clouds and implicit skeleton topology. Starting from those two views of hand gestures, we attempt to introduce dynamics and diversities into the local neighborhood of the graph by dividing it into sets of physical neighbors, temporal neighbors and varying neighbors. Next, we systematically proceed with three innovations, including the novel edge-varying graph, normalized edge convolution operation, and zig-zag sampling strategy, to alleviate the challenges resulting from engineering practices. Finally, spatial-based GCNs called normalized edge convolutional networks are constructed for hand gesture recognition. Experiments on publicly available hand datasets show that our work is stable for performing state-of-the-art gesture recognition, and ablation experiments are also provided to validate each contribution.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call