Speech Emotion Recognition Adapted to Multimodal Semantic Repositories

Nikolaos Vryzas,Lazaros Vrysis,Charalampos Dimoulas,Rigas Kotsakis

doi:10.1109/smap.2018.8501881

Abstract

Speech emotion is an important paralinguistic element of speech communication, which undoubtedly involves high level of subjectivity, without concrete modeling of the implicated emotional states. Specifically, sentimental expression varies in great proportions among different spoken languages and persons. The current work is focused on the investigation of emotional states discrimination potentials, in an adaptive/personalized approach, aiming at the creation of an effective multimodal speech emotion recognition service. In this context, an emotional speech ground truth database is formulated, containing semantically/ emotionally loaded utterances of a certain speaker in five basic sentiments. In the conducted experiments several classification algorithms are implemented and compared to the results of a generalized/ augmented multi-speaker emotional speech database. Furthermore, an audio-based application is designed for real time sentiment identification, while utilizing speech recording tools combined with camera and a Speech-to-Text modules. The audio, video and text files for every spoken utterance are labeled and stored via a user-friendly and functional GUI, for the subsequent augmentation of the personalized database.

Full Text