Abstract
Currently, 3D human pose estimation has gradually been a well-liked subject. Although various models based on the deep neural network have produced an excellent performance, they still suffer from the ignorance of multiple feasible pose solutions and the problem of the relatively-fixed input length. To solve these issues, a coordinate transformer encoder based on a 2D pose is constructed to generate multiple feasible pose solutions, and multi-to-one pose mapping is employed to generate a reliable pose. A temporal transformer encoder is used to exploit the temporal dependencies of consecutive pose sequences, which avoids the issue of relatively-fixed input length caused by temporal dilated convolution. Adequate experiments indicate that our model achieves a promising performance.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.