Intention-Related Natural Language Grounding via Object Affordance Detection and Intention Semantic Extraction.

Jinpeng Mi,Qingdu Li,Song Tang,Changshui Zhang,Jianwei Zhang,Hongzhuo Liang,Nikolaos Katsakis

doi:10.3389/fnbot.2020.00026

Abstract

Similar to specific natural language instructions, intention-related natural language queries also play an essential role in our daily life communication. Inspired by the psychology term “affordance” and its applications in Human-Robot interaction, we propose an object affordance-based natural language visual grounding architecture to ground intention-related natural language queries. Formally, we first present an attention-based multi-visual features fusion network to detect object affordances from RGB images. While fusing deep visual features extracted from a pre-trained CNN model with deep texture features encoded by a deep texture encoding network, the presented object affordance detection network takes into account the interaction of the multi-visual features, and reserves the complementary nature of the different features by integrating attention weights learned from sparse representations of the multi-visual features. We train and validate the attention-based object affordance recognition network on a self-built dataset in which a large number of images originate from MSCOCO and ImageNet. Moreover, we introduce an intention semantic extraction module to extract intention semantics from intention-related natural language queries. Finally, we ground intention-related natural language queries by integrating the detected object affordances with the extracted intention semantics. We conduct extensive experiments to validate the performance of the object affordance detection network and the intention-related natural language queries grounding architecture.

Highlights

Human beings live in a multi-modal environment, where natural language and vision are the dominant channels for communication and perception
Inspired by the Latent Semantic Analysis (LSA) which is used to measure the similarity of words and text documents meaning, we propose a semantic metric measuring based approach to build the mapping between the detected affordances and the intentionrelated natural language queries
We proposed an architecture that integrates an object affordance detection network with an intention-semantic extraction module to ground intention-related natural language queries

Summary

Introduction

Human beings live in a multi-modal environment, where natural language and vision are the dominant channels for communication and perception. We would like to develop intelligent agents with the ability to communicate and perceive their working scenarios as humans do. We often refer to objects in the environment when we have a pragmatic interaction with others, and we have the ability to comprehend specific and intention-related natural language queries in a wide range of practical applications. Cognitive psychologist Don Norman discussed affordance from the design perspective so that the function of objects could be perceived. He argued that affordance refers to the fundamental properties of an object and determines how the object could possibly be used (Norman, 1988). According to Norman’s viewpoint, drinks afford drinking, foods afford eating, and readings, such as text documents are for reading

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in neurorobotics	Publication Date: May 13, 2020
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Intention-Related Natural Language Grounding via Object Affordance Detection and Intention Semantic Extraction.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in neurorobotics

Lead the way for us

Similar Papers

A Cybernetic Approach to Characterization of Complex Sensory Environments: Implications for Human Robot Interaction
Kelly Dickerson ... Jeremy Gaston
-
Kelly Dickerson, et. al.Kelly Dickerson ... Jeremy Gaston
21 Jun 2017
21 Jun 2017

Object affordance based multimodal fusion for natural Human-Robot interaction
Jinpeng Mi ... Jianwei Zhang
Cognitive Systems Research | VOL. 54
Jinpeng Mi, et. al.Jinpeng Mi ... Jianwei Zhang
29 Dec 2018
Cognitive Systems Research | VOL. 54

Evaluating quality in human-robot interaction: A systematic search and classification of performance and human-centered factors, measures and metrics towards an industry 5.0
Enrique Coronado ... Natsuki Yamanobe
Journal of Manufacturing Systems | VOL. 63
Enrique Coronado, et. al.Enrique Coronado ... Natsuki Yamanobe
01 Apr 2022
Journal of Manufacturing Systems | VOL. 63

Design and Prototype of a Tunable Stiffness Arm for Safe Human-Robot Interaction
Yu She ... Cheng Lai
-
Yu She, et. al.Yu She ... Cheng Lai
21 Aug 2016
21 Aug 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Intention-Related Natural Language Grounding via Object Affordance Detection and Intention Semantic Extraction.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in neurorobotics