Abstract
AbstractThis study addresses the challenge of improving communication between the deaf and hearing community by exploring different sign language recognition (SLR) techniques. Due to privacy issues and the need for validation by interpreters, creating large-scale sign language (SL) datasets can be difficult. The authors address this by presenting a new Spanish isolated sign language recognition dataset, CALSE-1000, consisting of 5000 videos representing 1000 glosses, with various signers and scenarios. The study also proposes using different computer vision techniques, such as face swapping and affine transformations, to augment the SL dataset and improve the accuracy of the model I3D trained using them. The results show that the inclusion of these augmentations during training leads to an improvement in accuracy in top-1 metrics by up to 11.7 points, top-5 by up to 8.8 points and top-10 by up to 9 points. This has great potential to improve the state of the art in other datasets and other models. Furthermore, the analysis confirms the importance of facial expressions in the model by testing with a facial omission dataset and shows how face swapping can be used to include new anonymous signers without the costly and time-consuming process of recording.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.