Abstract

To address the issues of low accuracy in existing 3D human pose estimation (HPE) methods and the limited level of details in Labanotation, we propose an extended Labanotation generation method for intangible cultural heritage dance videos based on 3D HPE. First, a 2D human pose sequence of the performer is inputted along with spatial location embeddings, where multiple spatial transformer modules are employed to extract spatial features of human joints and generate cross-joint multiple hypotheses. Afterward, temporal features are extracted by a self-attentive module and the correlation between different hypotheses is learned using bilinear pooling. Finally, the 3D joint coordinates of the performer are predicted, which are matched with the corresponding extended Labanotation symbols using the Laban template matching method to generate extended Labanotation. Experimental results show that, compared with VideoPose and CrossFormer algorithms, the Mean Per Joint Position Error (MPJPE) of the proposed method is reduced by 3.7[Formula: see text]mm and 0.6[Formula: see text]mm, respectively on Human3.6M dataset, and the generated extended Labanotation can better describe the movement details compared with the basic Labanotation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call