PereiraASLNet: ASL letter recognition with YOLOX taking Mean Average Precision and Inference Time considerations

Noel Pereira

doi:10.1109/aisp53593.2022.9760665

Abstract

Sign language essentially allows for communication without the need to explicitly say words. It was developed by the American School for the Deaf in the early 90’s. It is a naturally generated language which incorporates facial movements and hand gestures to convey thoughts and ideas. In modern times, it is used predominantly by people who are deaf and hard of hearing. Unlike most languages, ASL isn’t widely taught which makes it difficult for the general population to communicate effectively with those people who predominantly use ASL as the sole means of communication. Therefore, arises the need for a system which detects and predicts letters from images and which can then be used in real time to overcome this language barrier. This research aims to develop a sign language recognition system atop of YOLOX, which is built on top of YOLOV3, which contains in its architecture, convolutional neural networks. Using the various backbones of YOLOX, this paper introduces and provides six models on every end of the accuracy-testing time spectrum from least accurate/fastest response time to the most accurate/slowest response time. I thereby propose PereiraASLNet, which trains YOLOX with custom classes from the letters A-Z and a Pascal VOC XML American Sign Language dataset developed by Roboflow and variants of YOLOX have been developed, taking into consideration the mean average precision and inference times of all the YOLOX backbone architectures namely the YOLOX-nano, YOLOX-tiny, YOLOX-small, YOLOX-medium, YOLOX-large and YOLOX-xlarge. The testing mean average precision for the models were found to be – 0.9046, 0.9070, 0.9227, 0.9304, 0.9329 and 0.9578 and the testing inference time was found to be 3.50ms, 12.97ms, 34.86ms, 64.56ms, 83.23ms and 97.56ms respectively

Full Text