Abstract

At present, speech emotion recognition is a challenging field of studies about emotions of speakers. Speech emotion recognition also enhances interaction between humans and machines. In many situations (e.g. embedded systems), we have to detect emotion in speech within limitation of both computing and memory resources. Even several previous works reported that a reasonable recognition rate can be achieved using transfer learning techniques with popular models, such as AlexNet, they suffered with a large model size and can not be executed on an embedded system. To address this problem, we propose a lightweight deep convolutional neural network architecture, which utilizes only partial component of the AlexNet with Log-Mel-Spectrograms as input. Our result shows that the proposed lightweight model can achieve a comparable recognition rate with the state of the art, but the number of parameters used in our model decreases around 272 times from the AlexNet.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call