Abstract

AbstractTemporal relation modeling is one of the key points to describe gesture changes in continuous sign language recognition. However, there are many similar gestures in sign language, therefore focusing only on global information can exacerbate the recognition ambiguity caused by various gesture combination. To alleviate this problem, we attempt to achieve the balance between the global information and the local information in gesture changes. Therefore, we construct a multi-level temporal relation graph (MLTRG). Specifically, the multi-level temporal relation graph of the video sequence is established according to different time spans, where the graph nodes are the corresponding visual features. Then the feature fusion and propagation of the multi-level temporal relation graph are performed by a graph convolutional network (GCN). Finally, we can reason and balance the global and the local temporal information of gesture changes in continuous sign language videos. We evaluate our method on the large-scale public datasets RWTH-PHOENIX-Weather-2014 and 2014T, the results prove the advantages and effectiveness of our method.KeywordsSign language recognitionTemporal relation modelingGraph convolutional network

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call