In this paper, a new structure of deep learning neural network is introduced to identify the static hand gesture in the sign language. The proposed structure includes the convolutional neural network (CNN) and the classical non-intelligent feature extraction method. In the proposed structure, the hand gesture image, after preprocessing and removing its background, passes through three different streams of feature extraction, to well extract of effective features and determine the hand gesture class. These three streams, that independently extract their own specific features, consist of three widely used methods in the hand gesture classification named CNN, Gabor filter and ORB feature descriptor. Then these features are merged and formed the final feature vector. By combining these efficient methods, in addition to achieving a very high accuracy in hand gestures classifying, the proposed structure becomes more resistant to uncertainties such as rotation and ambiguity in the hand gestures. Another prominent feature of the proposed structure is its comprehensiveness on different image databases, compared to the similar methods. The transfer learning technique demonstrates that the proposed structure has the ability to be used as a pre-trained structure for any type of image database. Finally, the proposed structure is applied to the three different databases of Massey, ASL Alphabet and ASL, which have 2520, 87,000 and 23,400 of hand gesture images, respectively. The results show the mean accuracy of the proposed structure for the Massey test set of 758 images, ASL with 7020 test images, and ASL Alphabet with 26,100 test images, at 99.92%, 99.8%, and 99.80% respectively.
Read full abstract