Abstract

Emotionally expressive non-verbal vocalizations can play a major role in human-robot interactions. Humans can assess the intensity and emotional valence of animal vocalizations based on simple acoustic features such as call length and fundamental frequency. These simple encoding rules are suggested to be general across terrestrial vertebrates. To test the degree of this generalizability, our aim was to synthesize a set of artificial sounds by systematically changing the call length and fundamental frequency, and examine how emotional valence and intensity is attributed to them by humans. Based on sine wave sounds, we generated sound samples in seven categories by increasing complexity via incorporating different characteristics of animal vocalizations. We used an online questionnaire to measure the perceived emotional valence and intensity of the sounds in a two-dimensional model of emotions. The results show that sounds with low fundamental frequency and shorter call lengths were considered to have a more positive valence, and samples with high fundamental frequency were rated as more intense across all categories, regardless of the sound complexity. We conclude that applying the basic rules of vocal emotion encoding can be a good starting point for the development of novel non-verbal vocalizations for artificial agents.

Highlights

  • Expressive non-verbal vocalizations can play a major role in human-robot interactions

  • The aforementioned acoustic cues are consistent with the existence of simple coding rules of affective vocalizations that are shared in mammals and that are the result of homologous signal production and neural processing[30]. These simple coding rules, namely that higher fundamental frequency is connected to higher intensity and shorter call length is connected to positive valence, are substantiated by studies on multiple mammalian species and in connection with various acoustic parameters (e.g. low harmonic-to-noise ratio is connected to higher arousal in baboons[32], dogs[25] and bonnet macaques (Macaca radiata)[33]

  • In the Linear Mixed Model, both the fundamental frequency and call length were in interaction with the sound category and the language

Read more

Summary

Introduction

Expressive non-verbal vocalizations can play a major role in human-robot interactions. The proposed functions of specific robots do not always require the level of complexity found in human communication[6], or their capabilities and functions are not in line with that of humans (e.g., no need for head-turning with 360° vision[9], no morphological limitations in sound production) To avoid these issues, another approach is to consider HRI as interspecific interaction in which the artificial agent is regarded as a separate species, and only has to be equipped with a basic level of social competence and communicational skills that are aligned with its function[9]. These simple coding rules, namely that higher fundamental frequency is connected to higher intensity and shorter call length is connected to positive valence, are substantiated by studies on multiple mammalian species (for reviews see14,15,21,31) and in connection with various acoustic parameters (e.g. low harmonic-to-noise ratio is connected to higher arousal in baboons[32], dogs[25] and bonnet macaques (Macaca radiata)[33]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call