Abstract

Sign Language Synthesis (SLS) is a domain-specific problem where multiple sign language words are stitched to generate a whole sentence in video, which serves to facilitate communications between the hearing-impaired people and healthy population. This paper presents a Variable Motion Frame Interpolation (VMFI) method for highly fluent SLS in scattered videos. Existing approaches for SLS mainly focus on mechanical virtual human technology, lacking high flexibility and natural effect. Also, the representative solutions to interpolate frames usually assume that the motion object moves at a constant speed which is not suitable for predicting the complex hand motion in frames of scattered sign language videos. To address the above issues, the proposed VMFI adopts acceleration to predict more accurate interpolated frames based on an end-to-end convolutional neural network. The framework of VMFI consists of variable optical flow estimation network and high-quality frame synthesis network that can approximate and fuse the intermediate optical flow to generate interpolated frames for synthesis. Experimental results on our realistic collected Chinese sign language dataset demonstrate that the proposed VMFI model achieves efficiency by performing better in PSNR (Peak Signal to Noise Ratio), SSIM (Structural Similarity) and MA (Motion Activity) and gets higher score in MOS (Mean Opinion Score) than other two representative methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call