Hand sign language recognition using multi-view hand skeleton

Razieh Rastgoo,Kourosh Kiani,Sergio Escalera

doi:10.1016/j.eswa.2020.113336

Razieh Rastgoo, Kourosh Kiani + Show 1 more

https://doi.org/10.1016/j.eswa.2020.113336

Copy DOI

Abstract

Hand sign language recognition from video is a challenging research area in computer vision, which performance is affected by hand occlusion, fast hand movement, illumination changes, or background complexity, just to mention a few. In recent years, deep learning approaches have achieved state-of-the-art results in the field, though previous challenges are not completely solved. In this work, we propose a novel deep learning-based pipeline architecture for efficient automatic hand sign language recognition using Single Shot Detector (SSD), 2D Convolutional Neural Network (2DCNN), 3D Convolutional Neural Network (3DCNN), and Long Short-Term Memory (LSTM) from RGB input videos. We use a CNN-based model which estimates the 3D hand keypoints from 2D input frames. After that, we connect these estimated keypoints to build the hand skeleton by using midpoint algorithm. In order to obtain a more discriminative representation of hands, we project 3D hand skeleton into three views surface images. We further employ the heatmap image of detected keypoints as input for refinement in a stacked fashion. We apply 3DCNNs on the stacked features of hand, including pixel level, multi-view hand skeleton, and heatmap features, to extract discriminant local spatio-temporal features from these stacked inputs. The outputs of the 3DCNNs are fused and fed to a LSTM to model long-term dynamics of hand sign gestures. Analyzing 2DCNN vs. 3DCNN using different number of stacked inputs into the network, we demonstrate that 3DCNN better capture spatio-temporal dynamics of hands. To the best of our knowledge, this is the first time that this multi-modal and multi-view set of hand skeleton features are applied for hand sign language recognition. Furthermore, we present a new large-scale hand sign language dataset, namely RKS-PERSIANSIGN, including 10′000 RGB videos of 100 Persian sign words. Evaluation results of the proposed model on three datasets, NYU, First-Person, and RKS-PERSIANSIGN, indicate that our model outperforms state-of-the-art models in hand sign language recognition, hand pose estimation, and hand action recognition.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Hand sign language recognition using multi-view hand skeleton

Abstract

Talk to us

Similar Papers

More From: Expert Systems with Applications

Lead the way for us

Journal: Expert Systems with Applications	Publication Date: Feb 22, 2020
Citations: 103

Similar Papers

Hand Sign Recognition System Based on EIT Imaging and Robust CNN Classification
Bilel Ben Atitallah ... Zheng Hu
IEEE Sensors Journal | VOL. 22
Bilel Ben Atitallah, et. al.Bilel Ben Atitallah ... Zheng Hu
15 Jan 2022
IEEE Sensors Journal | VOL. 22

Human-robot interaction with multi-sensor fusion based hand sign recognition for service robot
Ren C Luo ... Yen-Chang Wu
-
Ren C Luo, et. al.Ren C Luo ... Yen-Chang Wu
01 Oct 2012
01 Oct 2012

Video-based isolated hand sign language recognition using a deep cascaded model
Razieh Rastgoo ... Kourosh Kiani
Multimedia Tools and Applications | VOL. 79
Razieh Rastgoo, et. al.Razieh Rastgoo ... Kourosh Kiani
02 Jun 2020
Multimedia Tools and Applications | VOL. 79

Real-time isolated hand sign language recognition using deep networks and SVD
Razieh Rastgoo ... Kourosh Kiani
Journal of Ambient Intelligence and Humanized Computing | VOL. 13
Razieh Rastgoo, et. al.Razieh Rastgoo ... Kourosh Kiani
16 Feb 2021
Journal of Ambient Intelligence and Humanized Computing | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hand sign language recognition using multi-view hand skeleton

Abstract

Talk to us

Similar Papers

More From: Expert Systems with Applications