Burmese language is challenging for speech emotion classification. Moreover, it is lack of resource and few research was made in this topic. To solve the challenging problem, novel feature extraction for Burmese language is proposed. For lack of resource, Burmese speech emotion corpus called BMISEC is built. To support the challenging problem, the advantages of feature extractions are fused to create a robust feature. Four features are fused. Novel text-tone feature, local binary pattern, mel-frequency cepstral coefficient and discrete wavelet transform are fused. To progress the performance, deep learning method called DenseNet-Emotion is used for classification. Support vector machine is used in DenseNet’s classifier layer. To show the robustness of the proposed system, three types of experiments are made on Tensorflow framework. They are ablation study, experiments with three publicly available datasets and experiments with the previous research methods and they are compared with the proposed method. It is found that feature fusion is superior to only one feature in emotion recognition. BMISEC gets better performance than other datasets. Moreover, the proposed method gets the superior result than previous research methods. The proposed method gets the accuracy of 88.388% for 50 epochs.
Read full abstract