Abstract

Sign language translation (SLT) is a challenging weakly supervised task without word-level annotations. An effective method of SLT is to leverage multimodal complementarity and to explore implicit temporal cues. In this work, we propose a graph-based multimodal sequential embedding network (MSeqGraph), in which multiple sequential modalities are densely correlated. Specifically, we build a graph structure to realize the intra-modal and inter-modal correlations. First, we design a graph embedding unit (GEU), which embeds a parallel convolution with channel-wise and temporal-wise learning into the graph convolution to learn the temporal cues in each modal sequence and cross-modal complementarity. Then, a hierarchical GEU stacker with a pooling-based skip connection is proposed. Unlike the state-of-the-art methods, to obtain a compact and informative representation of multimodal sequences, the GEU stacker gradually compresses the channel <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$d$</tex-math></inline-formula> with multi-modalities <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$m$</tex-math></inline-formula> rather than the temporal dimension <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$t$</tex-math></inline-formula> . Finally, we adopt the connectionist temporal decoding strategy to explore the entire video’s temporal transition and translate the sentence. Extensive experiments on the USTC-CSL and BOSTON-104 datasets demonstrate the effectiveness of the proposed method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.