A Two-Stream CNN Framework for American Sign Language Recognition Based on Multimodal Data Fusion

Qing Gao,Honghai Liu,Jinguo Liu,Uchenna Emeoha Ogenyi,Zhaojie Ju

doi:10.1007/978-3-030-29933-0_9

Qing Gao, Honghai Liu + Show 3 more

Open Access

PDF Available

https://doi.org/10.1007/978-3-030-29933-0_9

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

At present, vision-based hand gesture recognition is very important in human-robot interaction (HRI). This non-contact method enables natural and friendly interaction between people and robots. Aiming at this technology, a two-stream CNN framework (2S-CNN) is proposed to recognize the American sign language (ASL) hand gestures based on multimodal (RGB and depth) data fusion. Firstly, the hand gesture data is enhanced to remove the influence of background and noise. Secondly, hand gesture RGB and depth features are extracted for hand gesture recognition using CNNs on two streams, respectively. Finally, a fusion layer is designed for fusing the recognition results of the two streams. This method utilizes multimodal data to increase the recognition accuracy of the ASL hand gestures. The experiments prove that the recognition accuracy of 2S-CNN can reach 92.08\(\%\) on ASL fingerspelling database and is higher than that of baseline methods.

Full Text