A Light-Weight Deep Convolutional Neural Network for Speech Emotion Recognition using Mel-Spectrograms

Kamin Atsavasirilert,Surasak Boonkla,Jessada Karnjana,Thanaruk Theeramunkong,Suthum Keerativittayanun,Manabu Okumura,Anocha Rugchatjaroen,Sasiporn Usanavasin

doi:10.1109/isai-nlp48611.2019.9045511

Abstract

At present, speech emotion recognition is a challenging field of studies about emotions of speakers. Speech emotion recognition also enhances interaction between humans and machines. In many situations (e.g. embedded systems), we have to detect emotion in speech within limitation of both computing and memory resources. Even several previous works reported that a reasonable recognition rate can be achieved using transfer learning techniques with popular models, such as AlexNet, they suffered with a large model size and can not be executed on an embedded system. To address this problem, we propose a lightweight deep convolutional neural network architecture, which utilizes only partial component of the AlexNet with Log-Mel-Spectrograms as input. Our result shows that the proposed lightweight model can achieve a comparable recognition rate with the state of the art, but the number of parameters used in our model decreases around 272 times from the AlexNet.

Full Text