Abstract

While machine translation for spoken language has advanced significantly, research on sign language translation (SLT) for deaf individuals remains limited. Obtaining annotations, such as gloss, can be expensive and time-consuming. To address these challenges, we propose a new sign language video-processing method for SLT without gloss annotations. Our approach leverages the signer's skeleton points to identify their movements and help build a robust model resilient to background noise. We also introduce a keypoint normalization process that preserves the signer's movements while accounting for variations in body length. Furthermore, we propose a stochastic frame selection technique to prioritize frames to minimize video information loss. Based on the attention-based model, our approach demonstrates effectiveness through quantitative experiments on various metrics using German and Korean sign language datasets without glosses.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.