Deep Learning-Based Object Classification and Position Estimation Pipeline for Potential Use in Robotized Pick-and-Place Operations

Sergey Soltan,Artemiy Oleinikov,M Fatih Demirci,Almas Shintemirov

doi:10.3390/robotics9030063

Abstract

Accurate object classification and position estimation is a crucial part of executing autonomous pick-and-place operations by a robot and can be realized using RGB-D sensors becoming increasingly available for use in industrial applications. In this paper, we present a novel unified framework for object detection and classification using a combination of point cloud processing and deep learning techniques. The proposed model uses two streams that recognize objects on RGB and depth data separately and combines the two in later stages to classify objects. Experimental evaluation of the proposed model including classification accuracy compared with previous works demonstrates its effectiveness and efficiency, making the model suitable for real-time applications. In particular, the experiments performed on the Washington RGB-D object dataset show that the proposed framework has 97.5% and 95% fewer parameters compared to the previous state-of-the-art multimodel neural networks Fus-CNN, CNN Features and VGG3D, respectively, with the cost of approximately 5% drop in classification accuracy. Moreover, the inference of the proposed framework takes 66.11%, 32.65%, and 28.77% less time on GPU and 86.91%, 51.12%, and 50.15% less time on CPU in comparison to VGG3D, Fus-CNN, and CNN Features. The potential applicability of the developed object classification and position estimation framework was then demonstrated on an experimental robot-manipulation setup realizing a simplified object pick-and-place scenario. In approximately 95% of test trials, the system was able to accurately position the robot over the detected objects of interest in an automatic mode, ensuring stable cyclic execution with no time delays.

Highlights

Industrial robot-manipulators are being widely deployed in manufacturing, warehouses, and other environments for autonomous object manipulation tasks involving repetitive pick-and-place operations such as part picking-placing, product packaging, bin-picking, and kitting, etc
We have presented a novel unified framework for object detection/classification and spatial position estimation using a combination of point cloud processing and deep learning techniques
Experimental evaluation of the proposed convolutional neural network (CNN) model including the comparison with the previous works demonstrates its effectiveness and efficiency

Summary

Introduction

Industrial robot-manipulators are being widely deployed in manufacturing, warehouses, and other environments for autonomous object manipulation tasks involving repetitive pick-and-place operations such as part picking-placing, product packaging, bin-picking, and kitting, etc. End-effector mounted or fixed monocamera and multicamera stereo vision systems were used for realizing 2D visual servo control schemes of industrial robots for autonomous object grasping and manipulation [3,4]. Such visual servoing schemes were based on non-trivial analytical derivations of a robot–target interaction matrices related 2D camera image features to robot kinematics, Robotics 2020, 9, 63; doi:10.3390/robotics9030063 www.mdpi.com/journal/robotics. Availability of compact optical 2D laser scanner and laser rangefinder sensors led to the development of various combined 2D and 3D vision end-effector mounted systems able to perform RGB image-based object classification and 3D point cloud composition through successive scanning of the working area by a robot, e.g., as in [6,7]. The accurate object detection or classification and 3D position/pose estimation based on RGB-D data processing is currently an active research direction, that attracted much attention upon launching the Amazon

Methods

Results

Conclusion