Abstract

In this paper, a framework is proposed for object recognition and pose estimation from color images using convolutional neural networks (CNNs). 3D object pose estimation along with object recognition has numerous applications, such as robot positioning versus a target object and robotic object grasping. Previous methods addressing this problem relied on both color and depth (RGB-D) images to learn low-dimensional viewpoint descriptors for object pose retrieval. In the proposed method, a novel quaternion-based multi-objective loss function is used, which combines manifold learning and regression to learn 3D pose descriptors and direct 3D object pose estimation, using only color (RGB) images. The 3D object pose can then be obtained either by using the learned descriptors in the nearest neighbor (NN) search or by direct neural network regression. An extensive experimental evaluation has proven that such descriptors provide greater pose estimation accuracy than the state-of-the-art methods. In addition, the learned 3D pose descriptors are almost object-independent and, thus, generalizable to unseen objects. Finally, when the object identity is not of interest, the 3D object pose can be regressed directly from the network, by overriding the NN search, thus significantly reducing the object pose inference time.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call