Abstract
Emotion recognition plays an essential role in affective computing. Machines can predict emotional clues by capturing human behaviors. Speech emotion recognition (SER) is one of the critical technologies to have a machine recognize emotion from speech. Most existing SER models rely on emotional annotations from human perceptual evaluations. However, emotion perception is subjective due to different emotional experiences, backgrounds, and cultures, so observing disagreement among annotations from human perceptual evaluations is customary. The most common way to decide on ground truths is using a majority vote or plurality rule. Previous studies utilize these methods to generate the ground truths of test data, but they discard some of the test data if these data have no consensus label. However, in this work, we keep all annotations to calculate their frequencies and generate emotion ground-truth using the thresholding method. The most important difference from previous works on SER is that we define the SER task as a multi-label task. Each data point is allowed to have one or more than one emotion. After defining the ground truth of the complete test set, we explore whether removing minority annotations affects SER systems' confidence. We use calibration error metrics to measure the accuracy and confidence of predictions from speech emotion classifiers. We plan to investigate two research questions: (1) Which label learning methods (e.g., hard-label, soft-label, multi-label, or distribution-label learning methods) can have better well-calibrated classifiers without applying any calibration methods? (2) Can predicting the agreements among annotators on sentence-level annotations improve the calibration of speech emotion classifiers? In my preliminary experiments, we use the distribution-label learning method without discarding any annotations to train SER systems to answer the second question at first. We evaluate the preliminary experiments on the MSP-PODCAST corpus and show the results in the various evaluation metrics.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.