Visually Guided Picking Control of an Omnidirectional Mobile Manipulator Based on End-to-End Multi-Task Imitation Learning

Chi-Yi Tsai,Yung-Shan Chou,Ching-Chang Wong,Yu-Cheng Lai,Chien-Che Huang

doi:10.1109/access.2019.2962335

Chi-Yi Tsai, Yung-Shan Chou + Show 3 more

Open Access

https://doi.org/10.1109/access.2019.2962335

Copy DOI

Abstract

In this paper, a novel deep convolutional neural network (CNN) based high-level multi-task control architecture is proposed to address the visual guide-and-pick control problem of an omnidirectional mobile manipulator platform based on deep learning technology. The proposed mobile manipulator control system only uses a stereo camera as a sensing device to accomplish the visual guide-and-pick control task. After the stereo camera captures the stereo image of the scene, the proposed CNN-based high-level multi-task controller can directly predict the best motion guidance and picking action of the omnidirectional mobile manipulator by using the captured stereo image. In order to collect the training dataset, we manually controlled the mobile manipulator to navigate in an indoor environment for approaching and picking up an object-of-interest (OOI). In the meantime, we recorded all of the captured stereo images and the corresponding control commands of the robot during the manual teaching stage. In the training stage, we employed the end-to-end multi-task imitation learning technique to train the proposed CNN model by learning the desired motion and picking control strategies from prior expert demonstrations for visually guiding the mobile platform and then visually picking up the OOI. Experimental results show that the proposed visually guided picking control system achieves a picking success rate of about 78.2% on average.

Highlights

In recent years, research on visual servoing of robot manipulators receives more and more attention because such a control method provides a robust solution for many robotic automation applications, e.g., agricultural harvesting [1], [2], bin picking [3], [4], and object grasping [5], [6]
(1) We propose a new high-level multi-task control architecture based on convolutional neural network (CNN) to learn the optimal guiding and picking actions of an omnidirectional mobile manipulator from stereo observations of the scene
EXPERIMENTAL RESULTS We implemented the proposed CNN-based guide-and-pick control system using Tensorflow 1.5.0 running on a laptop equipped with 2.4GHz Intel Core i7-5500U, 8GB system memory, and Ubuntu 16.04 operating system

Summary

Introduction

Research on visual servoing of robot manipulators receives more and more attention because such a control method provides a robust solution for many robotic automation applications, e.g., agricultural harvesting [1], [2], bin picking [3], [4], and object grasping [5], [6]. Among these robotic control applications, the function of robot grasping and navigation control plays an important role in a robot manipulator system to achieve autonomous manipulation tasks [7], [8], which can be applied in several industrial and service scenarios. The model-based methods often cost much time on scene interpretation, task-level reasoning, and object 3D pose estimation

Objectives

Results

Conclusion