SMILEE: Symmetric Multi-modal Interactions with Language-gesture Enabled (AI) Embodiment

Sujeong Kim,Kilho Son,David Salter,Mohamed R Amer,Amir Tamrakar,Luke Deluccia

doi:10.18653/v1/n18-5018

Abstract

We demonstrate an intelligent conversational agent system designed for advancing human-machine collaborative tasks. The agent is able to interpret a user’s communicative intent from both their verbal utterances and non-verbal behaviors, such as gestures. The agent is also itself able to communicate both with natural language and gestures, through its embodiment as an avatar thus facilitating natural symmetric multi-modal interactions. We demonstrate two intelligent agents with specialized skills in the Blocks World as use-cases of our system.

Highlights

Recent advances in speech recognition and natural language processing techniques have resulted in increasing use of intelligent assistants, such as Google Assistant, Siri, and Alexa, in our daily lives, replacing keyboard or touch interfaces
We have demonstrated a system for symmetric natural communication with a computer which can interact with its users with verbal and non-verbal communication allowing it to have more robust conversation
We demonstrated two use cases in the BW domain

Summary

Introduction

Recent advances in speech recognition and natural language processing techniques have resulted in increasing use of intelligent assistants, such as Google Assistant, Siri, and Alexa, in our daily lives, replacing keyboard or touch interfaces. In order to facilitate the communication of a machines complex ideas to the human, the machine’s utterances need to be embellished with appropriate non-verbal behaviors. The platform acts as the eyes and ears for the AI agent, tracking the blocks on the table (Son et al, 2016) and multi-modal behaviors of the human interacting with it, both verbal and non-verbal (Siddique et al, 2015). It provides an embodiment of the machine in the form of a simple humanoid avatar for the users to interact with. This system is publicly available for use by the research community (Salter et al, 2017)

SMILEE

Scene Perception Agent

Deictic Gesture Interpretation Agent

Co-speech Gesture Generation Agent

The General Blocks-World PSA

The Symmetry Game PSA

Conclusion and Future Work