Detecting Target Objects by Natural Language Instructions Using an RGB-D Camera

Jiatong Bao,Yunyi Jia,Ning Xi,Hongru Tang,Yu Cheng

doi:10.3390/s16122117

Jiatong Bao, Yunyi Jia + Show 3 more

Open Access

https://doi.org/10.3390/s16122117

Copy DOI

Abstract

Controlling robots by natural language (NL) is increasingly attracting attention for its versatility, convenience and no need of extensive training for users. Grounding is a crucial challenge of this problem to enable robots to understand NL instructions from humans. This paper mainly explores the object grounding problem and concretely studies how to detect target objects by the NL instructions using an RGB-D camera in robotic manipulation applications. In particular, a simple yet robust vision algorithm is applied to segment objects of interest. With the metric information of all segmented objects, the object attributes and relations between objects are further extracted. The NL instructions that incorporate multiple cues for object specifications are parsed into domain-specific annotations. The annotations from NL and extracted information from the RGB-D camera are matched in a computational state estimation framework to search all possible object grounding states. The final grounding is accomplished by selecting the states which have the maximum probabilities. An RGB-D scene dataset associated with different groups of NL instructions based on different cognition levels of the robot are collected. Quantitative evaluations on the dataset illustrate the advantages of the proposed method. The experiments of NL controlled object manipulation and NL-based task programming using a mobile manipulator show its effectiveness and practicability in robotic applications.

Highlights

As assistants to human beings, robots are moving into more service oriented roles in human life, both in living and working
For the action grounding—to transfer the actions described in the natural language (NL) to some defined robot actions—A set of mapping rules could be predefined or learned, since actions considered for Sensors 2016, 16, 2117; doi:10.3390/s16122117
The contribution of this paper is three-fold: (i) we formulate the problem of NL-based target object detection as the state estimation in the space of all possible object grounding states according to visual object segmentation results and extracted linguistic object cues; (ii) an RGB-D scene dataset as well as different groups of NL instructions based on different cognition levels of the robot are collected for evaluation of target object detection in robotic manipulation applications; and (iii) we show quantitative evaluation results on the dataset and experimentally validate the effectiveness and practicability of the proposed method on the applications of NL controlled object manipulation and NL-based task programming using our mobile manipulator system

Summary

Introduction

As assistants to human beings, robots are moving into more service oriented roles in human life, both in living and working. This work mainly explores the object grounding problem and concretely studies how to detect target objects by NL instructions using an RGB-D camera in robotic manipulation tasks. The contribution of this paper is three-fold: (i) we formulate the problem of NL-based target object detection as the state estimation in the space of all possible object grounding states according to visual object segmentation results and extracted linguistic object cues; (ii) an RGB-D scene dataset as well as different groups of NL instructions based on different cognition levels of the robot are collected for evaluation of target object detection in robotic manipulation applications; and (iii) we show quantitative evaluation results on the dataset and experimentally validate the effectiveness and practicability of the proposed method on the applications of NL controlled object manipulation and NL-based task programming using our mobile manipulator system.

Related Work

Problem Formulation

Segmenting Objects of Interest on the Planar Surface

Identifying Relations between Objects

Learning Object Attributes

Natural Language Processing

Datasets of RGB-D Scenes and NL Instructions

Target Object Detection Results

Application on NL Controlled Object Manipulation

Application on NL-Based Task Programming

Conlusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sensors	Publication Date: Dec 13, 2016
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Detecting Target Objects by Natural Language Instructions Using an RGB-D Camera

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors

Lead the way for us

Similar Papers

Teach robots understanding new object types and attributes through natural language instructions
Jiatong Bao ... Hongru Tang
-
Jiatong Bao, et. al.Jiatong Bao ... Hongru Tang
01 Nov 2016
01 Nov 2016

Generating machine-executable plans from end-user's natural-language instructions
Rui Liu ... Xiaoli Zhang
Knowledge-Based Systems | VOL. 140
Rui Liu, et. al.Rui Liu ... Xiaoli Zhang
01 Nov 2017
Knowledge-Based Systems | VOL. 140

Natural-Language-Instructed Industrial Task Execution
Rui Liu ... Xiaoli Zhang
-
Rui Liu, et. al.Rui Liu ... Xiaoli Zhang
21 Aug 2016
21 Aug 2016

A Review of Service Robots Coping With Uncertain Information in Natural Language Instructions
M A Viraj J Muthugala ... A G Buddhika P Jayasekara
IEEE Access | VOL. 6
M A Viraj J Muthugala, et. al.M A Viraj J Muthugala ... A G Buddhika P Jayasekara
01 Jan 2018
IEEE Access | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Detecting Target Objects by Natural Language Instructions Using an RGB-D Camera

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors