Abstract

With the development of image/video based 3D pose estimation techniques, service robots, human-computer interaction, and 3D somatosensory games have been developed rapidly. However, 3D pose estimation is still one of the most challenging tasks in computer vision. On the one hand, diversity of poses, occlusion and self-occlusion, change in illumination, and complex background increase the complexity of human pose estimation. On the other hand, many application scenarios require high real-time performance for 3D pose estimation. Therefore, we present a 3D pose estimation method based on binocular vision in this paper. For each frame of the binocular videos, the human body is detected firstly; Then Stacked-Hourglass network is used to detect the human joints, and the pixel coordinates of the key joints of all the human bodies in the binocular images are obtained. Finally, with the calibrated camera internal parameters and external parameters, the 3D coordinates of the major joints in the world coordinate system are estimated. This method does not rely on 3D data sets for training. It only requires binocular cameras to perform 3D pose estimation. The experimental results show that the method can locate key joints precisely and the real-time performance is achieved in complex background.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call