Abstract

The technology of Three-dimensional human posture estimation is used in sports, motion recognition, and special effects of video media. Among various methods for this, multi-view 3D human pose estimation is essential for precise estimation even in complex real-world environments. But Existing models for multi-view 3D human posture estimation have the disadvantage of high order of time complexity as they use 3D feature maps. This paper proposes a method to extend an existing monocular viewpoint multi-frame model based on Transformer with lower time complexity to 3D human posture estimation for multi-viewpoints. To expand to multi-viewpoints our proposed method first generates an 8-dimensional joint coordinate that connects 2-dimensional joint coordinates for 17 joints at 4-vieiwpoints acquired using the 2-dimensional human posture detector, CPN(Cascaded Pyramid Network). This paper then converts them into 17×32 data with patch embedding, and enters the data into a transformer model, finally. Consequently, the MLP(Multi-Layer Perceptron) block that outputs the 3D-human posture simultaneously updates the 3D human posture estimation for 4-viewpoints at every iteration. Compared to Zheng[5]'s method the number of model parameters of the proposed method was 48.9%, MPJPE(Mean Per Joint Position Error) was reduced by 20.6 mm (43.8%) and the average learning time per epoch was more than 20 times faster.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.