Abstract

Iconic gestures are used to depict physical objects mentioned in speech, and the gesture form is assumed to be based on the image of a given object in the speaker’s mind. Using this idea, this study proposes a model that learns iconic gesture forms from an image representation obtained from pictures of physical entities. First, we collect a set of pictures of each entity from the web, and create an average image representation from them. Subsequently, the average image representation is fed to a fully connected neural network to decide the gesture form. In the model evaluation experiment, our two-step gesture form selection method can classify seven types of gesture forms with over 62% accuracy. Furthermore, we demonstrate an example of gesture generation in a virtual agent system in which our model is used to create a gesture dictionary that assigns a gesture form for each entry word in the dictionary.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.