Automatic Generation of Head Nods using Utterance Texts

Ryo Ishii,Taichi Katayama,Junji Tomita,Ryuichiro Higashinaka

doi:10.1109/roman.2018.8525729

Abstract

We propose a model to generate head nods accompanying an utterance from natural language. To the best of our knowledge, previous models generated simple nods from the final words at the end of an utterance, i.e., using bag of words. We focused on various text analyzed using various types of language information such as dialog act, part of speech, a large-scale Japanese thesaurus, and word position in a sentence. We also generated detailed parameters of speaker's nodding presence, frequency, and depth, which was the first attempt to do so. First, we compiled a Japanese corpus of 24 dialogues including utterance and nod information. Next, using the corpus, we constructed our generation model that estimates nodding presence, frequency, and depth, during a phrase by using such various types of language information as well as bag of words. The results indicate that our model outperformed simple automatic nod-generating models using only bag of words and chance level. The results also indicate that dialog act, part of speech, the large-scale Japanese thesaurus, and word position are useful for generating nods. We also evaluated, through subjective evaluation, if our nod-generation model is useful with conversational agents. The results show that the nodding generated with our model improves user impressions of naturalness, humanness, likability, and reliability toward a conversational agent.

Full Text