Abstract

Coordinate regression has established itself as one of the most successful current trends in model-based 6 degree of freedom (6-DOF) object pose estimation from a single image. The underlying idea is to train a system that can regress the three-dimensional coordinates of an object, given an input RGB or RGB-D image and known object geometry, followed by a robust procedure such as RANSAC to optimize the object pose. These coordinate regression based approaches exhibit state-of-the-art performance by using pixel-level cues to model the probability distribution of object parts within the image. However, they fail to capture global information at the object level to learn accurate foreground/background segmentation. In this letter, we show that combining global features for object segmentation and local features for coordinate regression results in pixel-accurate object boundary detections and consequently a substantial reduction in outliers and an increase in overall performance. We propose a deep architecture with an instance-level object segmentation network that exploits global image information for object/background segmentation and a pixel-level classification network for coordinate regression based on local features. We evaluate our approach on the standard ground-truth 6-DOF pose estimation benchmarks and show that our joint approach to accurate object segmentation and coordinate regression results in the state-of-the-art performance on both RGB and RGB-D 6-DOF pose estimation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call