NeuroGrasp: Multimodal Neural Network With Euler Region Regression for Neuromorphic Vision-Based Grasp Pose Estimation

Hu Cao,Alois Knoll,Zhijun Li,Yingbai Hu,Guang Chen

doi:10.1109/tim.2022.3179469

Abstract

Grasp pose estimation is a crucial procedure in robotic manipulation. Most of the current robot grasp manipulation systems are built on frame-based cameras like RGB-D cameras. However, the traditional frame-based grasp pose estimation methods have encountered challenges in scenarios such as low dynamic range and low power consumption. In this work, a neuromorphic vision sensor (DAVIS) is introduced to the field of robotic grasp. DAVIS is an event-based bio-inspired vision sensor that records asynchronous streams of local pixel-level light intensity changes, called events. The strengths of DAVIS are it can provide high temporal resolution, high dynamic range, low power consumption, and no motion blur. We construct a neuromorphic vision-based robotic grasp dataset with 154 moving objects, named NeuroGrasp, which is the first RGB-Event multi-modality grasp dataset (to the best of our knowledge). This dataset records both RGB frames and the corresponding event streams, providing frame data with rich color and texture information and event streams with high temporal resolution and high dynamic range. Based on the NeuroGrasp dataset, we further develop a multi-modal neural network with a specific Euler-Region-Regression sub-Network (ERRN) to perform grasp pose estimation. Combining with frame-based and event-based vision, the proposed method achieves better performance than the method that only takes RGB frames or event streams as input on the NeuroGrasp dataset.

Full Text