Abstract

Synthetic 3D object models have been proven crucial in object pose estimation, as they are utilized to generate a huge number of accurately annotated data. The object pose estimation problem is usually solved for images originating from the real data domain by employing synthetic images for training data enrichment, without fully exploiting the fact that synthetic and real images may have different data distributions. In this work, we argue that 3D object pose estimation problem is easier to solve for images originating from the synthetic domain, rather than the real data domain. To this end, we propose a 3D object pose estimation framework consisting of a two-step process, where a novel pose-oriented image-to-image translation step is first employed to translate noisy real images to clean synthetic ones and then, a 3D object pose estimation method is applied on the translated synthetic images to finally predict the 3D object poses. A novel pose-oriented objective function is employed for training the image-to-image translation network, which enforces that pose-related object image characteristics are preserved in the translated images. As a result, the pose estimation network does not require real data for training purposes. Experimental evaluation has shown that the proposed framework greatly improves the 3D object pose estimation performance, when compared to state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call