Abstract

This work introduces two novel approaches for feature extraction applied to video-based Arabic sign language recognition, namely, motion representation through motion estimation and motion representation through motion residuals. In the former, motion estimation is used to compute the motion vectors of a video-based deaf sign or gesture. In the preprocessing stage for feature extraction, the horizontal and vertical components of such vectors are rearranged into intensity images and transformed into the frequency domain. In the second approach, motion is represented through motion residuals. The residuals are then thresholded and transformed into the frequency domain. Since in both approaches the temporal dimension of the video-based gesture needs to be preserved, hidden Markov models are used for classification tasks. Additionally, this paper proposes to project the motion information in the time domain through either telescopic motion vector composition or polar accumulated differences of motion residuals. The feature vectors are then extracted from the projected motion information. After that, model parameters can be evaluated by using simple classifiers such as Fisher's linear discriminant. The paper reports on the classification accuracy of the proposed solutions. Comparisons with existing work reveal that up to 39% of the misclassifications have been corrected.

Highlights

  • Used in over 21 countries covering a large geographical and demographical portion of the world, Arabic sign language (ArSL) has received little attention in sign language recognition research

  • The feature vectors are generated by means of zonal coding at a given cutoff. Since this process is repeated for each pair of successive images, the resultant feature vectors retain the temporal dimension of the video-based gesture

  • When using K-nearest neighbor (KNN) classifiers, it is worth mentioning that the projection of the temporal dimension via the polar accumulated differences and the telescopic vector composition schemes yields comparable recognition results to those obtained by hidden Markov models (HMMs)

Read more

Summary

INTRODUCTION

Used in over 21 countries covering a large geographical and demographical portion of the world, Arabic sign language (ArSL) has received little attention in sign language recognition research. Related work on recognition of non-Arabic using temporal-domain feature extraction mainly rely on computationally expensive motion analysis approaches such as motion estimation. In [3] the authors proposed to extract spatial and temporal image features. Combining Fourier descriptors with the motion analysis using an HMM classifier resulted in a classification accuracy of 93.5%. The reported classification accuracy is 96.21% based on 40 American Sign Language gestures. This work proposes an enhancement of ArSL recognition rates via an assortment of novel feature extraction schemes using the same dataset as the one described in [2]. They include motion representation through motion estimation, telescopic vector composition, motion residuals, and polar accumulated differences (ADs).

DATASET DESCRIPTION
FEATURE EXTRACTION SCHEMES
Time-dependent feature extraction
Time-independent feature extraction
EXPERIMENTAL RESULTS
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call