Abstract

Speech emotion recognition has received an increasing interest in recent years, which is often conducted on the assumption that speech utterances in training and testing datasets are obtained under the same conditions. However, in reality, this assumption does not hold as the speech data are often collected from different devices or environments. Hence, there exists discrepancy between the training and testing data, which will have an adverse effect on recognition performance. In this paper, we examine the problem of cross-corpus speech emotion recognition. To address it, we present a novel transfer linear subspace learning (TLSL) framework to learn a common feature subspace for source and target datasets. In TLSL, a nearest neighbor graph algorithm is used to measure the similarity between different corpora, and a feature grouping strategy is introduced to divide the emotional features into two categories, i.e., high transferable part (HTP) versus low transferable part (LTP). To explore the proposed TLSL with different scenarios, we propose two kinds of TLSL approaches, called transfer unsupervised linear subspace learning (TULSL) and transfer supervised linear subspace learning (TSLSL), and provide the corresponding solutions for the optimization problems. Extensive experiments on several benchmark datasets validate the effectiveness of TLSL for cross-corpus speech emotion recognition.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call