Human Computer Interaction (HCI) focuses on the interaction between humans and machines. An extensive list of applications exists for hand gesture recognition techniques, major candidates for HCI. The list covers various fields, one of which is sign language recognition. In this field, however, high accuracy and robustness are both needed; both present a major challenge. In addition, feature extraction from hand gesture images is a tough task because of the many parameters associated with them. This paper proposes an approach based on a bag-of-words (BoW) model for automatic recognition of American Sign Language (ASL) numbers. In this method, the first step is to obtain the set of representative vocabularies by applying a K-means clustering algorithm to a few randomly chosen images. Next, the vocabularies are used as bin centers for BoW histogram construction. The proposed histograms are shown to provide distinguishable features for classification of ASL numbers. For the purpose of classification, the K-nearest neighbors (kNN) classifier is employed utilizing the BoW histogram bin frequencies as features. For validation, very large experiments are done on two large ASL number-recognition datasets; the proposed method shows superior performance in classifying the numbers, achieving an F1 score of 99.92% in the Kaggle ASL numbers dataset.
Read full abstract