MES-P: An Emotional Tonal Speech Dataset in Mandarin with Distal and Proximal Labels

Zhongzhe Xiao,Ying Chen,Zhi Tao,Weibei Dou,Liming Chen

doi:10.1109/taffc.2019.2945322

Abstract

Emotion shapes all aspects of our interpersonal and intellectual experiences. Its automatic analysis has therefore many applications. In this paper, we propose an emotional tonal speech dataset, Mandarin Chinese Emotional Speech Dataset-Portrayed (MES-P), with both distal and proximal labels. In contrast with state of the art datasets which only focused on perceived emotions, MES-P includes not only perceived emotions (proximal labels) but also intended emotions (distal labels), to make it possible to study human emotional intelligence, i.e., emotion expression/understanding ability, and emotional misunderstandings in real life. Furthermore, MES-P also captures a main feature of tonal languages, and provides emotional speech samples matching the tonal distribution in real life Mandarin. MES-P dataset also features emotion intensity variations, by introducing both moderate and intense versions for joy, anger, and sadness, in addition to neutral. Ratings of the collected speech samples are made in valence-arousal space through continuous coordinate locations, resulting in an emotional distribution pattern in 2D VA space. High consistency between the speakers emotional intentions and the listeners perceptions is also proved by Cohens Kappa coefficients. Finally, extensive experiments are carried out as a baseline on MES-P for automatic emotion recognition and with comparison to human emotion intelligence.

Full Text