Abstract

The detection of human emotions from speech signals remains a challenging frontier in audio processing and human-computer interaction domains. This study introduces a novel approach to Speech Emotion Recognition (SER) using a Dendritic Layer combined with a Capsule Network (DendCaps). A Convolutional Neural Network (NN) and a Long Short-Time Neural Network (CLSTM) hybrid model are used to create a baseline which is then compared to the DendCap model. Integrating dendritic layers and capsule networks for speech emotion detection can harness the unique advantages of both architectures, potentially leading to more sophisticated and accurate models. Dendritic layers, inspired by the nonlinear processing properties of dendritic trees in biological neurons, can handle the intricate patterns and variabilities inherent in speech signals, while capsule networks, with their dynamic routing mechanisms, are adept at preserving hierarchical spatial relationships within the data, enabling the model to capture more refined emotional subtleties in human speech. The main motivation for using DendCaps is to bridge the gap between the capabilities of biological neural systems and artificial neural networks. This combination aims to capitalize on the hierarchical nature of speech data, where intricate patterns and dependencies can be better captured. Finally, two ensemble methods namely stacking and boosting are used for evaluating the CLSTM and DendCaps networks and the experimental results show that stacking of the CLSTM and DendCaps networks gives the superior result with a 75% accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call