Abstract

Speech emotion is an important paralinguistic element of speech communication, which undoubtedly involves high level of subjectivity, without concrete modeling of the implicated emotional states. Specifically, sentimental expression varies in great proportions among different spoken languages and persons. The current work is focused on the investigation of emotional states discrimination potentials, in an adaptive/personalized approach, aiming at the creation of an effective multimodal speech emotion recognition service. In this context, an emotional speech ground truth database is formulated, containing semantically/ emotionally loaded utterances of a certain speaker in five basic sentiments. In the conducted experiments several classification algorithms are implemented and compared to the results of a generalized/ augmented multi-speaker emotional speech database. Furthermore, an audio-based application is designed for real time sentiment identification, while utilizing speech recording tools combined with camera and a Speech-to-Text modules. The audio, video and text files for every spoken utterance are labeled and stored via a user-friendly and functional GUI, for the subsequent augmentation of the personalized database.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.