Abstract

This paper addresses the problem of instance-level 6DoF pose estimation from a single RGBD image in an indoor scene. Many recent works have shown that a two-stage network, which first detects the keypoints and then regresses the keypoints for 6d pose estimation, achieves remarkable performance. However, the previous methods concern little about channel-wise attention and the keypoints are not selected by comprehensive use of RGBD information, which limits the performance of the network. To enhance RGB feature representation ability, a modular Split-Attention block that enables attention across feature-map groups is proposed. In addition, by combining the Oriented FAST and Rotated BRIEF (ORB) keypoints and the Farthest Point Sample (FPS) algorithm, a simple but effective keypoint selection method named ORB-FPS is presented to avoid the keypoints appear on the non-salient regions. The proposed algorithm is tested on the Linemod and the YCB-Video dataset, the experimental results demonstrate that our method outperforms the current approaches, achieves ADD(S) accuracy of 94.5% on the Linemod dataset and 91.4% on the YCB-Video dataset.

Highlights

  • Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai 200083, China; School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Key Laboratory of Intelligent Infrared Perception, Chinese Academy of Sciences, Shanghai 200083, China

  • Convolutional Neural Network (CNN) is used in pose estimation, PVNet [7] regress the

  • In order to make comprehensive use of image and point cloud information, we propose the 3D keypoint voting network (3DKV) for 6DoF pose estimation

Read more

Summary

Introduction

6d pose estimation is a functional task of many computer vision applications, such as augmented reality [1], autonomous navigation [2,3], robot grasping [4,5] and intelligent manufacturing. In order to make comprehensive use of image and point cloud information, we propose the 3D keypoint voting network (3DKV) for 6DoF pose estimation. 3d points in the point cloud through the camera parameter, find the final 3d ORB-FPS keypoints through the Farthest Points Sample (FPS) from the selected points This method improves the ability of keypoints to characterize objects, and avoids the selected keypoints appear on non-significant areas like smooth surfaces, making it easier to locate keypoints and improving the ability to estimate the pose. (2) A simple and effective keypoints selection approach named ORB-FPS is proposed It utilizes a two-stage approach to select the keypoints, and avoids the selected keypoints appear on non-significant areas like smooth surfaces, making them easier to locate and improving the network’s ability to estimate the pose.

Pose from RGB Images
Pose from Point Cloud
Pose from RGBD Data
Methodology
Split-Attention for Image Feature Extraction
Pointcloud Feature Extraction
ORB-FPS Keypoint Selection
Instance Semantic Segmentation
Center Point Voting
Pose Calculate
Experiments
Datasets
Evaluation Metrics
Implementation Details
Result on the Linemod Dataset
Result on the YCB-Dataset
Ablation Study
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.