Feeling backwards? How temporal order in speech affects the time course of vocal emotion recognition

Eugen Wassiliwizky,Simon Rigoulot,Marc D. Pell

doi:10.3389/fpsyg.2013.00367

Abstract

Recent studies suggest that the time course for recognizing vocal expressions of basic emotion in speech varies significantly by emotion type, implying that listeners uncover acoustic evidence about emotions at different rates in speech (e.g., fear is recognized most quickly whereas happiness and disgust are recognized relatively slowly; Pell and Kotz, 2011). To investigate whether vocal emotion recognition is largely dictated by the amount of time listeners are exposed to speech or the position of critical emotional cues in the utterance, 40 English participants judged the meaning of emotionally-inflected pseudo-utterances presented in a gating paradigm, where utterances were gated as a function of their syllable structure in segments of increasing duration from the end of the utterance (i.e., gated syllable-by-syllable from the offset rather than the onset of the stimulus). Accuracy for detecting six target emotions in each gate condition and the mean identification point for each emotion in milliseconds were analyzed and compared to results from Pell and Kotz (2011). We again found significant emotion-specific differences in the time needed to accurately recognize emotions from speech prosody, and new evidence that utterance-final syllables tended to facilitate listeners' accuracy in many conditions when compared to utterance-initial syllables. The time needed to recognize fear, anger, sadness, and neutral from speech cues was not influenced by how utterances were gated, although happiness and disgust were recognized significantly faster when listeners heard the end of utterances first. Our data provide new clues about the relative time course for recognizing vocally-expressed emotions within the 400–1200 ms time window, while highlighting that emotion recognition from prosody can be shaped by the temporal properties of speech.

Highlights

Emotional events, and social displays of emotion—the expression of a face, the tone of a speaker’s voice, and/or their body posture and movements—must be decoded successfully and quickly to avoid negative outcomes and to promote individual goals
Perceptual-acoustic studies show that basic emotions can be reliably identified and differentiated at high accuracy levels from prosodic cues alone, and that these expressions are marked by distinct acoustic patterns characterized by differences in perceived duration, speech rate, intensity, pitch register and variation, and other speech parameters
Presumably biologically-specified codes for signaling emotions in the voice (e.g., Ekman, 1992; Wilson and Wharton, 2006) bear an important relationship to the temporal features of spoken language? This phenomenon, which has been highlighted at different times (Cosmides, 1983; Scherer, 1988), could be explained by the accent structure of utterances we presented for emotion recognition and by natural processes of speech production, factors which both contribute to the “socialization” or shaping of vocal emotion expressions in the context of spoken language

Summary

Introduction

Social displays of emotion—the expression of a face, the tone of a speaker’s voice, and/or their body posture and movements—must be decoded successfully and quickly to avoid negative outcomes and to promote individual goals. Studies demonstrating accurate pan-cultural recognition of emotional faces (Izard, 1971; Ekman, 1972) and distinct patterns of autonomic nervous system activity in response to certain emotions (e.g., Ekman et al, 1983; Levenson, 1992) have served to fuel the idea of a fixed set of discrete and hypothetically “basic” emotions, typically anger, fear, disgust, sadness, and happiness, opinions vary (see Ekman, 1992; Sauter et al, 2010). Speech rate tends to decrease when speakers are sad and increase when speakers experience fear; at the same time, differences in relative pitch height, variation, and other cue configurations serve to differentiate these (and other) emotional meanings

Methods

Results

Discussion

Conclusion