Abstract

Most dynamic sign word misclassifications are caused by redundant spatial–temporal (SPT) feature pruning that lacks language semantic and temporal dependencies. SPT feature recognition is one of the important aspects for the evaluation of the misclassification of dynamic sign words. The redundant pruning of SPT feature space influences the language model of sign confusion, model complexity, and SPT feature similarity. The purpose of this article is to develop a new multi-scale SPT feature-based dynamic sign word recognition approach via a low-cost feature selection method (FS) and End-to-end Fourier convolution neural network (EFCNN). Instead of a sensor fusion technique for obtaining frame position alignment, in the EFCNN, new 3D frame position and coordinates are determined using a pixel weighting and alignment function of the first and succeeding 25 spatial intensities of the 3D video changes across hand motion. The new spatial weight and the original spatial coordinates are fused and truncated in the Fourier domain. We generate the temporal dependence of the fused features. A feature selection known as the FS-EFCNN is introduced to select compact features with a preserved language meaning. Five state-of-the-art feature selection methods, namely Infinite FS (InFS), Relief FS, Fisher, MIM, ILFS, and ensemble FS-EFCNN were deployed to guide and optimize the learning performance of EFCNN. The experimental result analysis highlighted the improved results of the FS-EFCNN method with the best accuracy of 99.86%, 99.89%, and 90.69% on 3D American Sign Language, British Sign Language, and Greek Sign Language data sets, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call