Abstract

Speech variability in real-world situations makes spoken emotion recognition a challenging task. While a variety of temporal and spectral speech features have been proposed, this paper investigates the effectiveness of using the glottal airflow signal in recognizing emotions. The speech used in this investigation is from a classical recording of the theatrical play "Waiting for Godot" by Samuel Beckett. Six emotions were investigated: happy, angry, sad, fear, surprise, and neutral. The proposed method was tested on the original recording and on simulated distortion conditions. In clean signal conditions the proposed method achieved average recognition rates of 76% for four emotions and 66.5% for all six emotions. Furthermore, it proved fairly robust under signal distortion and noisy conditions achieving recognition rates of 60% for four and 51.6% for six emotions for severely low-pass filtered speech, while with additive white Gaussian noise at SNR = 10 dB recognition rates were 53% and 47% for the four and six-emotion tasks, respectively. Results indicate that glottal signal features provide good separation of spoken emotions and achieve enhanced classification performance when compared to other approaches.

Highlights

  • Interpersonal communication is greatly facilitated by the detection of emotion through visual and auditory clues, which are used to deduce the motive, intent, and general psychological state of a person

  • The goal of this work is to study the contribution of the glottal flow signal in differentiating emotional states and whether glottal-based features can be effective in spoken emotion recognition

  • In our previous work [11], we have studied the effect of selecting appropriate glottal and/or speech features for emotion recognition for a small database and we established that glottal features alone were sufficient

Read more

Summary

Introduction

Interpersonal communication is greatly facilitated by the detection of emotion through visual and auditory clues, which are used to deduce the motive, intent, and general psychological state of a person. Because of the multilayered processes (cognitive, linguistic, and articulatory) involved in its production, is a main vehicle for emotional expression, which in turn enhances the information contained in the intended spoken message. Phonetic, prosodic, and linguistic features undergo transformations associated with emotional expression. In this context, acoustical analysis aims at the robust extraction of relevant signal features which best describe the changes associated with a particular emotion. Speech analysis provides considerable advantages over other techniques because it is nonintrusive and the signal can be acquired with a microphone, even over the telephone, and it has received considerable attention

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call