Abstract

6D object pose estimation is an essential task in vision-based robotic grasping and manipulation. Prior works extract object's 6D pose by regressing from single RGB-D frame without considering the occluded objects in the frame, limiting their performance in human-robot collaboration scenarios with heavy occlusion. In this paper, we present an end-to-end model named \textit{TemporalFusion}, which integrates the temporal motion information from RGB-D images for 6D object pose estimation. The core of proposed model is to embed and fuse the temporal motion information from multi-frame RGB-D sequences, which could handle heavy occlusion in human-robot collaboration tasks. Furthermore, the proposed deep model can also obtain stable pose sequences, which is essential for real-time robotic grasping tasks. We evaluated the proposed method in the YCB-Video dataset, and experimental results show our model outperforms state-of-the-art approaches. Our code is available at https://github.com/mufengjun260/H-MPose.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call