Abstract

Although analyzing animal shape and pose has potential applications in many fields, there is little work on 3D animal pose estimation. This can be attributed to two aspects: the lack of large-scale well-annotated datasets, and perspective ambiguities which make it difficult to map 2D space to 3D space. To address data scarcity, we propose an unsupervised method to estimate 3D animal pose, given only 2D poses. To deal with perspective ambiguities, we introduce a canonical consistency loss and a camera consistency loss to impose geometric priors in the training process, and combine the reprojection loss and the 2D pose discriminator to enable self-supervised learning. Specifically, given a 2D pose, the pose generator network generates a corresponding 3D pose and the camera network estimates a camera rotation. During training, the generated 3D pose is randomly reprojected onto camera viewpoints to synthesize a new 2D pose. The synthesized 2D pose is decomposed into a 3D pose and a camera rotation, based on which consistency losses are imposed in both 3D canonical poses and camera rotations for self-supervised training. We evaluate the proposed method on real and synthetic datasets, i.e., SMAL and AcinoSet. The experimental results demonstrate the effectiveness of the proposed method and we achieve state-of-the-art performance among unsupervised algorithms for 3D animal canonical pose estimation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call