Abstract
Easy recording and sharing of video content has led to the creation and distribution of increasing quantities of sign language (SL) content. Current capabilities make locating SL videos on a desired topic dependent on the existence and correctness of metadata indicating both the language and topic of the video. Automated techniques to detect sign language content can aid this problem. This paper compares metadata-based classifiers and multimodal classifiers, using both early and late fusion techniques, with video content-based classifiers in the literature. Comparisons of applying TF-IDF, LDA, and NMF in the generation of metadata features indicates that NMF performs best, either when used independently or when combined with video features. Multimodal classifiers perform better than unimodal SL video classifiers. Experiments show multimodal features obtained results of up to 86% precision, 81% recall, and 84% F1 score. This represents an improvement on F1 score of roughly 9% in comparison with the video-based approach presented in the literature and an improvement of 6% over text-based features extracted using NMF.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.