Design of Efficient Speech Emotion Recognition Based on Multi Task Learning

Liu Yunxiang,Zhang Kexin

doi:10.1109/access.2023.3237268

Abstract

Speech emotion recognition technology includes feature extraction and classifier construction. However, the recognition efficiency is reduced due to noise interference and gender differences. To solve this problem, this paper used two multi-task learning models based on adversarial multi-task learning(ASP-MTL). The first model took emotion recognition as the main task and noise recognition as the auxiliary task, and removed the noise part identified by the auxiliary task. After identifying the non-noise part, the second model was constructed. The second model took emotion recognition as the main task and gender classification as the auxiliary task. These two multi-task learning models can not only can use shared information to learn the relationship between different tasks, but also can identify specific tasks. This paper used Audio/Visual Emotion Challenge (AVEC) database and AFEW6.0 database,which were recorded in the field environment. Considering the problem of data imbalance between datasets, the data balance operation was carried out on the data sets in the process of data preprocessing. The paper shows an increase of around 10% in terms of accuracy and F1 score with the recent works using AVEC database and AFEW6.0 datasets, which proved that this paper has made a great progress in SER.

Full Text