Abstract

Estimation and tracking of 6DoF poses of objects in images is a challenging problem of great importance for robotic interaction and augmented reality. Recent approaches applying deep neural networks for pose estimation have shown encouraging results. However, most of them rely on training with real images of objects with severe limitations concerning ground truth pose acquisition, full coverage of possible poses, and training dataset scaling and generalization capability. This paper presents a novel approach using a Convolutional Neural Network (CNN) trained exclusively on single-channel Synthetic images of objects to regress 6DoF object Poses directly (SynPo-Net). The proposed SynPo-Net is a network architecture specifically designed for pose regression and a proposed domain adaptation scheme transforming real and synthetic images into an intermediate domain that is better fit for establishing correspondences. The extensive evaluation shows that our approach significantly outperforms the state-of-the-art using synthetic training in terms of both accuracy and speed. Our system can be used to estimate the 6DoF pose from a single frame, or be integrated into a tracking system to provide the initial pose.

Highlights

  • Robotic interaction plays an essential role in automatic production, showing a significant increase in demand in recent years [1]

  • We compare against the state-of-the-art by evaluating our proposed Convolutional Neural Network (CNN) on the entire LINEMOD and TUD-L [57] datasets

  • We proposed SynPo-Net, a novel CNN-based approach for 6 Degree-of-Freedom (6DoF) object pose estimation trained exclusively with RGB synthetic images reduced to single-channel images in pre-processing

Read more

Summary

Introduction

Robotic interaction plays an essential role in automatic production, showing a significant increase in demand in recent years [1]. Depth information enables a more reliable pose estimation for low-textured objects, especially under challenging lighting conditions. Monocular camera setups are low-cost and more compact. They are already available on most current mobile devices. Pose estimation algorithms relying only on RGB image data are of great importance while posing significant challenges as well. Typical features used in image processing, such as ORB features [7], have limitations in scaling, rotation and illumination variations of the targets. They require target objects with strong edge features

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call