Abstract

Crossmodal interaction in situated language comprehension is important for effective and efficient communication. The relationship between linguistic and visual stimuli provides mutual benefit: While vision contributes, for instance, information to improve language understanding, language in turn plays a role in driving the focus of attention in the visual environment. However, language and vision are two different representational modalities, which accommodate different aspects and granularities of conceptualizations. To integrate them into a single, coherent system solution is still a challenge, which could profit from inspiration by human crossmodal processing. Based on fundamental psycholinguistic insights into the nature of situated language comprehension, we derive a set of performance characteristics facilitating the robustness of language understanding, such as crossmodal reference resolution, attention guidance, or predictive processing. Artificial systems for language comprehension should meet these characteristics in order to be able to perform in a natural and smooth manner. We discuss how empirical findings on the crossmodal support of language comprehension in humans can be applied in computational solutions for situated language comprehension and how they can help to mitigate the shortcomings of current approaches.

Highlights

  • Enabling artificial systems to engage in a natural and smooth spoken dialog with humans is a major scientific and technological challenge

  • We review important findings from psycholinguistic research and confront them with recent advances in building crossmodal natural language comprehension systems, trying to identify potential drawbacks of existing computational solutions and to learn from the human model to overcome them

  • A natural language understanding system whose comprehension and prediction capability improved by including predictions derived from affordances into the decision taking process was presented by Gorniak and Roy (2007)

Read more

Summary

INTRODUCTION

Enabling artificial systems to engage in a natural and smooth spoken dialog with humans is a major scientific and technological challenge. The fusion metaphor of combining the output of two independent information sources will not be viable if we aim for the more ambitious goal of taking advantage of a closed feedback loop between language and vision Both subsystems seem to be developed into separate components that are able to produce and receive contributions from one another while they are processing the input; they interact with each other. This situation raises many questions on how the human mind organizes this interplay in detail and how certain aspects of it can be implemented in an artificial agent, thereby leading to systems that, rather than fuse both modalities, maintain separate, but interacting representations. We discuss some heuristics humans apply to speed up language comprehension (section 9)

SPEAKER INTENTION
RESOLUTION OF LINGUISTIC AMBIGUITIES
CROSSMODAL REFERENCE
VISUAL GUIDANCE AND SEARCH
CROSSMODAL INTERACTION OF
INCREMENTALITY
PREDICTION
HEURISTIC DECISION TAKING
Findings
10. CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call