Abstract

Precise annotation of 6-D poses in real data is intricate and time-consuming, however, an essential requirement to train pose estimation pipelines. We propose a way for scalable, end-to-end 6-D pose regression with weak supervision to avoid this problem. Our method requires neither 3-D models nor 6-D object poses as ground truth. Instead, we use 2-D bounding boxes and object sizes as the only labels and constrain the problem with multiple images of known relative poses during training. A novel Rotated-IoU loss brings together a pose prediction from an image with labeled 2-D bounding boxes of the corresponding object in other views. Our rotation estimation combines an initial coarse pose classification with an offset regression using a continuous rotation parametrization that allows for direct pose estimation. At test time, the model still uses only a single image to predict a 6-D pose. We observe that multi-view constraints and our rotation representation used during training lead to better learning of 6-D pose embeddings in comparison to fully supervised methods. Experiments on several datasets show that the proposed method is capable of predicting poses of good quality, in spite being trained with only weak labels. Direct pose regression without the need for a consecutive refinement stage thereby ensures real-time performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.