Abstract

In cochlear implants (CIs), acoustic speech cues, especially for pitch, are delivered in a degraded form. This study’s aim is to assess whether due to degraded pitch cues, normal-hearing listeners and CI users employ different perceptual strategies to recognize vocal emotions, and, if so, how these differ. Voice actors were recorded pronouncing a nonce word in four different emotions: anger, sadness, joy, and relief. These recordings’ pitch cues were phonetically analyzed. The recordings were used to test 20 normal-hearing listeners’ and 20 CI users’ emotion recognition. In congruence with previous studies, high-arousal emotions had a higher mean pitch, wider pitch range, and more dominant pitches than low-arousal emotions. Regarding pitch, speakers did not differentiate emotions based on valence but on arousal. Normal-hearing listeners outperformed CI users in emotion recognition, even when presented with CI simulated stimuli. However, only normal-hearing listeners recognized one particular actor’s emotions worse than the other actors’. The groups behaved differently when presented with similar input, showing that they had to employ differing strategies. Considering the respective speaker’s deviating pronunciation, it appears that for normal-hearing listeners, mean pitch is a more salient cue than pitch range, whereas CI users are biased toward pitch range cues.

Highlights

  • In everyday situations, speech conveys a message through semantic content and through indexical cues, such as the talker’s emotional state

  • As the signal delivered in each electrode is modulated by the speech envelope that carries temporal F0 cues, pitch perception remains limitedly possible. Another task that relies on the perception of temporal and spectral cues of a speaker’s voice, confirmed that cochlear implants (CIs) users mostly rely on temporal voice pitch cues, whereas NH listeners can utilize both spectral and temporal voice pitch cues (Fu, Chinchilla, & Galvin, 2004; Fu, Chinchilla, Nogaki, & Galvin, 2005; Fuller, Gaudrain, et al, 2014; Kovacic & Balaban, 2009, 2010; Wilkinson, Abdel-Hamid, Galvin, Jiang, & Fu, 2013)

  • Our results indicate that acoustic cues for emotion recognition are ranked by listeners on the basis of salience, and that these cue orderings are different for NH listeners and CI users

Read more

Summary

Introduction

Speech conveys a message through semantic content and through indexical cues, such as the talker’s emotional state. Luo et al showed that emotion recognition was better in NH listeners listening to acoustic simulations of CIs (4–8 channels) than in actual CI users. These studies suggested that, due to the aforementioned limitations in temporal and spectral cues in CIs, emotion recognition in CI users is mostly based on the acoustic cues of intensity and duration, but not on the cues of pitch or other voice characteristics. Another task that relies on the perception of temporal and spectral cues of a speaker’s voice, confirmed that CI users mostly rely on temporal voice pitch cues, whereas NH listeners can utilize both spectral and temporal voice pitch cues (Fu, Chinchilla, & Galvin, 2004; Fu, Chinchilla, Nogaki, & Galvin, 2005; Fuller, Gaudrain, et al, 2014; Kovacic & Balaban, 2009, 2010; Wilkinson, Abdel-Hamid, Galvin, Jiang, & Fu, 2013)

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call