Abstract

Affective computing, especially from speech, is one of the key steps toward building more natural and effective human-machine interaction. In recent years, several emotional speech corpora in different languages have been collected; however, Turkish is not among the languages that have been investigated in the context of emotion recognition. For this purpose, a new Turkish emotional speech database, which includes 5,100 utterances extracted from 55 Turkish movies, was constructed. Each utterance in the database is labeled with emotion categories (happy, surprised, sad, angry, fearful, neutral, and others) and three-dimensional emotional space (valence, activation, and dominance). We performed classification of four basic emotion classes (neutral, sad, happy, and angry) and estimation of emotion primitives using acoustic features. The importance of acoustic features in estimating the emotion primitive values and in classifying emotions into categories was also investigated. An unweighted average recall of 45.5% was obtained for the classification. For emotion dimension estimation, we obtained promising results for activation and dominance dimensions. For valence, however, the correlation between the averaged ratings of the evaluators and the estimates was low. The cross-corpus training and testing also showed good results for activation and dominance dimensions.

Highlights

  • Recognizing the emotional state of the interlocutor and changing the way of communicating play a crucial role for the success of human-computer interaction

  • 2 Turkish emotional speech database In recent years, several corpora in different languages have been collected; Turkish is not among the languages that has been investigated in the context of emotion recognition

  • 5 Conclusion In this work, we carried out a study on emotion recognition from Turkish speech using acoustic features

Read more

Summary

Introduction

Recognizing the emotional state of the interlocutor and changing the way of communicating play a crucial role for the success of human-computer interaction. 2.1 Acquisition Collecting real-life utterances is a challenging task; most of the previous studies have used speech data with studio-recorded emotions. We decided to use Turkish movies from various genres for data collection because the speech extracted from movies is more realistic than studio-recorded emotions expressed by speakers reading some pre-defined sentences. The annotators were asked to listen to the entire speech recordings (randomly permuted) and assign an emotion label (both categorical and dimensional) for each utterance.

Feature selection
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call