Abstract

Data-driven synthesis of human motion during conversational speech is an active research area with applications that include character animation, computer gaming and conversational agents. Natural looking motion is key to both perceived realism and understanding of any synthesised animation. Multi-modal speech and body-motion data is scarce and limited, so it is common to augment real motion data by mirroring the body pose to double the number of training samples. This augmentation is based on the assumption that a person’s gesturing is not affected by handedness and that the reflected pose is plausible. In this study, we explore the validity of this assumption by evaluating the reflective symmetry of a speaker’s arms during conversational exchanges. We analyse the left and right arm motion of 36 subjects during dyadic conversation and present the per-frame symmetry of the arm gestures. To identify temporal offsets caused by the presence of a leading hand, we compute the time lag between movements of the left and right arms. We perform a nearest neighbour search to test the validity of any mirrored pose. We also consider information theory to examine the information gain from mirroring the data. We implement a speech-to-gesture generative model to determine the efficacy of lateral mirroring techniques for data augmentation. Our findings suggest that both positional symmetry and left–right motion offsets vary from speaker to speaker. We conclude that data augmentation by mirroring is valid in certain cases when considering the mirrored pose as a new virtual identity, but that it should be carefully considered as a generic approach if the gesturing style and handedness of the original speaker is to be maintained. • Review the motion symmetry of multiple speakers during dyadic conversation. • Analyse positional, temporal and informational symmetry of arm motion. • Discuss the efficacy of lateral mirroring of the human body for data augmentation. • Conclude lateral mirroring is not suited as a generic approach. • Suggest lateral mirroring as a new identity is a suitable data augmentation method. • Propose our statistical analysis for evaluating speech-driven conversational agents.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call