INGRESS: Interactive visual grounding of referring expressions

Mohit Shridhar,David Hsu,Dixant Mittal

doi:10.1177/0278364919897133

Abstract

This article presents INGRESS, a robot system that follows human natural language instructions to pick and place everyday objects. The key question here is to ground referring expressions: understand expressions about objects and their relationships from image and natural language inputs. INGRESS allows unconstrained object categories and rich language expressions. Further, it asks questions to clarify ambiguous referring expressions interactively. To achieve these, we take the approach of grounding by generation and propose a two-stage neural-network model for grounding. The first stage uses a neural network to generate visual descriptions of objects, compares them with the input language expressions, and identifies a set of candidate objects. The second stage uses another neural network to examine all pairwise relations between the candidates and infers the most likely referred objects. The same neural networks are used for both grounding and question generation for disambiguation. Experiments show that INGRESS outperformed a state-of-the-art method on the RefCOCO dataset and in robot experiments with humans. The INGRESS source code is available at https://github.com/MohitShridhar/ingress.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

INGRESS: Interactive visual grounding of referring expressions

Abstract

Talk to us

Similar Papers

More From: The International Journal of Robotics Research

Lead the way for us

Journal: The International Journal of Robotics Research	Publication Date: Jan 2, 2020
Citations: 58

Similar Papers

Interactive Visual Grounding of Referring Expressions for Human-Robot Interaction
Mohit Shridhar ... David Hsu
-
Mohit Shridhar, et. al.Mohit Shridhar ... David Hsu
26 Jun 2018
26 Jun 2018

Learning a bidirectional mapping between human whole-body motion and natural language using deep recurrent neural networks
Matthias Plappert ... Tamim Asfour
Robotics and Autonomous Systems | VOL. 109
Matthias Plappert, et. al.Matthias Plappert ... Tamim Asfour
02 Aug 2018
Robotics and Autonomous Systems | VOL. 109

Developments in The Field of Natural Language Processing

International Journal of Advanced Research in Computer Science | VOL. 8

30 Apr 2017
International Journal of Advanced Research in Computer Science | VOL. 8

AI Model to Generate SQL Queries from Natural Language Instructions through Voice
Aditya Sawant ... Rohit Raina
Journal of Physics: Conference Series | VOL. 2273
Aditya Sawant, et. al.Aditya Sawant ... Rohit Raina
01 May 2022
Journal of Physics: Conference Series | VOL. 2273

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

INGRESS: Interactive visual grounding of referring expressions

Abstract

Talk to us

Similar Papers

More From: The International Journal of Robotics Research