Abstract
Speech emotion recognition (SER) is one of the latest challenge in human-computer interaction. In conventional SER classification methods, a single emotion label is outputted per one utterance as the estimation result. This is because conventional speech emotional databases which are used to train SER models have a single emotion label for one utterance. However, it is often the case that multiple emotions are expressed simultaneously with different intensities in human speech. In order to realize more natural SER than ever, existence of multiple emotions in one utterance should be taken into account. Therefore, we created an emotional speech database which contains multiple emotions and their intensities labels. The creation experiment was conducted by extracting speech utterance parts where emotions appear from existing video works. In addition, we evaluated the created database by performing statistical analysis on the database. As a result, 2,025 samples were obtained, of which 1,525 samples contained multiple emotions.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have