Spatial Language Understanding with Multimodal Graphs using Declarative Learning based Programming

Parisa Kordjamshidi,Umar Manzoor,Taher Rahgooy

doi:10.18653/v1/w17-4306

Abstract

This work is on a previously formalized semantic evaluation task of spatial role labeling (SpRL) that aims at extraction of formal spatial meaning from text. Here, we report the results of initial efforts towards exploiting visual information in the form of images to help spatial language understanding. We discuss the way of designing new models in the framework of declarative learning-based programming (DeLBP). The DeLBP framework facilitates combining modalities and representing various data in a unified graph. The learning and inference models exploit the structure of the unified graph as well as the global first order domain constraints beyond the data to predict the semantics which forms a structured meaning representation of the spatial context. Continuous representations are used to relate the various elements of the graph originating from different modalities. We improved over the state-of-the-art results on SpRL.

Highlights

Spatial language understanding is important in many real-world applications such as geographical information systems, robotics, and navigation when the robot has a camera on the head and receives instructions about grabbing objects and finding their locations, etc
One approach towards spatial language understanding is to map the natural language to a formal spatial representation appropriate for spatial reasoning
The contribution of this paper is a) we report results on combining vision and language that extend and improve the spatial role labeling state-ofthe-art models, b) we model the task in the framework of declarative learning based programming and show its expressiveness in representing such complex structured output tasks

Summary

Introduction

Spatial language understanding is important in many real-world applications such as geographical information systems, robotics, and navigation when the robot has a camera on the head and receives instructions about grabbing objects and finding their locations, etc. The previous research on spatial role labeling (Kordjamshidi et al, 2010, 2017b, 2012) and ISOSpace (Pustejovsky et al, 2011, 2015) aimed at formalizing such a problem and providing machine learning solutions to find such a mapping in a data-driven way (Kordjamshidi and Moens, 2015; Kordjamshidi et al, 2011). Such extractions are made from available textual resources. Our interest in formal meaning representation distinguishes our work from other vision and language tasks and the choice of the data since our future goal is to integrate explicit qualitative spatial reasoning models into learning and spatial language understanding

Objectives

Results

Conclusion