Deep Learning Based Object Recognition Using Physically-Realistic Synthetic Depth Scenes

Daulet Baimukashev,Huseyin Atakan Varol,Artemiy Oleinikov,Askat Kuzdeuov,Alikhan Zhilisbayev,Denis Fadeyev,Zhanat Makhataeva

doi:10.3390/make1030051

Daulet Baimukashev, Huseyin Atakan Varol + Show 5 more

Open Access

https://doi.org/10.3390/make1030051

Copy DOI

Abstract

Recognizing objects and estimating their poses have a wide range of application in robotics. For instance, to grasp objects, robots need the position and orientation of objects in 3D. The task becomes challenging in a cluttered environment with different types of objects. A popular approach to tackle this problem is to utilize a deep neural network for object recognition. However, deep learning-based object detection in cluttered environments requires a substantial amount of data. Collection of these data requires time and extensive human labor for manual labeling. In this study, our objective was the development and validation of a deep object recognition framework using a synthetic depth image dataset. We synthetically generated a depth image dataset of 22 objects randomly placed in a 0.5 m × 0.5 m × 0.1 m box, and automatically labeled all objects with an occlusion rate below 70%. Faster Region Convolutional Neural Network (R-CNN) architecture was adopted for training using a dataset of 800,000 synthetic depth images, and its performance was tested on a real-world depth image dataset consisting of 2000 samples. Deep object recognizer has 40.96% detection accuracy on the real depth images and 93.5% on the synthetic depth images. Training the deep learning model with noise-added synthetic images improves the recognition accuracy for real images to 46.3%. The object detection framework can be trained on synthetically generated depth data, and then employed for object recognition on the real depth data in a cluttered environment. Synthetic depth data-based deep object detection has the potential to substantially decrease the time and human effort required for the extensive data collection and labeling.

Highlights

Robust object detection and recognition are fundamental aspects of grasping, robot manipulation, human-robot interaction and augmented reality
We present a deep learning (DL)-based object recognition framework using synthetic depth images
The models trained on clean and noise-added synthetic images were tested for object recognition of real-world depth images

Summary

Introduction

Robust object detection and recognition are fundamental aspects of grasping, robot manipulation, human-robot interaction and augmented reality. Cluttered environments, occlusion between objects, lighting conditions and small deformable objects remain as challenges. Objects may appear in different scale and forms depending on the camera viewpoint and calibration. Accurate scene understanding including object detection and pixel-wise semantic segmentation is crucial for practical interaction with real-world objects. The goal of the Amazon Picking Challenge 2017 was to construct a robotic system which can pick items from a warehouse shelf and place them into a box. Teams utilized a wide range of sensors, perception and motion planning algorithms [1]. The first step in this pick-and-place task was the detection and recognition of the objects. Most teams used deep learning (DL) to tackle this problem [2]

Objectives

Methods

Results

Conclusion