Generating Body Motions using Spoken Language in Dialogue

Ryo Ishii,Taichi Katayama,Junji Tomita,Ryuichiro Higashinaka

doi:10.1145/3267851.3267866

Abstract

We propose a model to automatically generate whole body motions accompanying utterances at appropriate times, similar to humans, by using various types of natural-language-analysis information obtained from spoken language. Specifically, we focus on the co-occurrence relationship between various types of natural-language-analysis information such as words included in the spoken language, parts of speech, a thesaurus, word positions, dialogue acts of the spoken language, and human motions. Our model automatically generates nods, head postures, facial expressions, hand gestures, and upper-body posture using such information. We first recorded a two-person dialogue and constructed a multimodal corpus including utterance and whole body motion information. Next, using the constructed corpus, we constructed our model for generating a motion for each phrase unit using machine learning and using words, parts of speech, a thesaurus, word positions, and speech acts of the entire spoken language as inputs. These types of natural-language-analysis information were useful for motion generation. The effectiveness of our model was verified through a subjective experiment using a virtual conversational agent. As a result, the agent's body motions and impressions regarding naturalness of motion, degree of coincidence between utterance and motion, humanness of the agent, and likability of the agent improved with our model.

Full Text