Emotional Speech Research Articles

In speech emotion recognition (SER), our research addresses the critical challenges of capturing and evaluating node information and their complex interrelationships within speech data. We introduce Skip Graph Convolutional and Graph Attention Network (SkipGCNGAT), an innovative model that combines the strengths of skip graph convolutional networks (SkipGCNs) and graph attention networks (GATs) to address these challenges. SkipGCN incorporates skip connections, enhancing the flow of information across the network and mitigating issues such as vanishing gradients, while also facilitating deeper representation learning. Meanwhile, the GAT in the model assigns dynamic attention weights to neighboring nodes, allowing SkipGCNGAT to focus on both the most relevant local and global interactions within the speech data. This enables the model to capture subtle and complex dependencies between speech segments, thus facilitating a more accurate interpretation of emotional content. It overcomes the limitations of previous single-layer graph models, which were unable to effectively represent these intricate relationships across time and in different speech contexts. Additionally, by introducing a pre-pooling SkipGCN combination technique, we further enhance the ability of the model to integrate multi-layer information before pooling, improving its capacity to capture both spatial and temporal features in speech. Furthermore, we rigorously evaluated SkipGCNGAT on the IEMOCAP and MSP-IMPROV datasets, two benchmark datasets in SER. The results demonstrated that SkipGCNGAT consistently achieved state-of-the-art performance. These findings highlight the effectiveness of the proposed model in accurately recognizing emotions in speech, offering valuable insights and a solid foundation for future research on capturing complex relationships within speech signals for emotion recognition.

Read full abstract

Emotions play a key role in determining the human mental state and indirectly express an individual’s well- being. A speech emotion recognition system can extract a person’s emotions from his/her speech inputs. There are some universal emotions such as anger, disgust, fear, happiness, pleasantness, sadness and neutral. These emotions are of significance especially in a situation like the Covid pandemic, when the aged or sick are vulnerable to depression. In the current paper, we examined various classification models with finite computational strength and resources in order to determine the emotion of a person from his/her speech. Speech prosodic features like pitch, loudness, and tone of speech, and work spectral features such as Mel Frequency Capstral Coefficients (MFCCs) of the voice were used to analyze the emotions of a person. Although sequence to sequence state of the art models for speech detection that offer high levels of accuracy and precision are currently in use, the computational needs of such approaches are high and inefficient. Therefore, in this work, we emphasised analysis and comparison of different classification algorithms such as multi layer perceptron, decision tree, support vector machine, and deep neural networks such as convolutional neural network and long short term memory. Given an audio file, the emotions that were exhibited by the speaker were recognized using machine learning and deep learning techniques. A comparative study was performed to identify the most appropriate algorithms that could be used to recognize emotions. Based on the experiment results, the MLP classifier and convolutional neural network model offered better accuracy with smaller variations when compared with other models used for the study.

Read full abstract

Emotional Speech Research Articles

Related Topics

Articles published on Emotional Speech

Topology-adaptive Bayesian optimization for deep ring echo state networks in speech emotion recognition

Speech Emotion Recognition Using Feedforward Neural Network

Brhamo: metaheuristic optimization algorithm for speech emotion recognition using spectral and hybrid features

Comparative Analysis of Spectrogram and MFCC Representations for Speech Emotion Recognition Using Machine Learning

Optimization Research on Integrating Mental Health Into Ideological and Political Theory Courses Through Deep Learning Network Applications

Polish Speech and Text Emotion Recognition in a Multimodal Emotion Analysis System

Comunicação pública, emoções e afetos: estratégias de comunicação de universidades federais no Instagram

Hierarchical convolutional neural networks with post-attention for speech emotion recognition

Speech Emotion Recognition Using Transfer Learning: Integration of Advanced Speaker Embeddings and Image Recognition Models

Enhanced Speech Emotion Recognition Using Conditional-DCGAN-Based Data Augmentation

Graph Neural Network-Based Speech Emotion Recognition: A Fusion of Skip Graph Convolutional Networks and Graph Attention Networks

CNN-Based Models for Emotion and Sentiment Analysis Using Speech Data

Research on Speech Emotion Recognition Method Based on ResSE_CNN1D

Emotion Classification from Speech Waveform Using Machine Learning and Deep Learning Techniques

Classification of Infant Crying Sounds Using SE-ResNet-Transformer

ESERNet: Learning spectrogram structure relationship for effective speech emotion recognition with swin transformer in classroom discourse analysis

DSTM: A transformer-based model with dynamic-static feature fusion in speech emotion recognition

Graph-based multi-Feature fusion method for speech emotion recognition

Data augmentation using a 1D-CNN model with MFCC/MFMC features for speech emotion recognition

Multimodal speech emotion recognition optimization using genetic algorithm

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Emotional Speech Research Articles

Related Topics

Articles published on Emotional Speech

Topology-adaptive Bayesian optimization for deep ring echo state networks in speech emotion recognition

Speech Emotion Recognition Using Feedforward Neural Network

Brhamo: metaheuristic optimization algorithm for speech emotion recognition using spectral and hybrid features

Comparative Analysis of Spectrogram and MFCC Representations for Speech Emotion Recognition Using Machine Learning

Optimization Research on Integrating Mental Health Into Ideological and Political Theory Courses Through Deep Learning Network Applications

Polish Speech and Text Emotion Recognition in a Multimodal Emotion Analysis System

Comunicação pública, emoções e afetos: estratégias de comunicação de universidades federais no Instagram

Hierarchical convolutional neural networks with post-attention for speech emotion recognition

Speech Emotion Recognition Using Transfer Learning: Integration of Advanced Speaker Embeddings and Image Recognition Models

Enhanced Speech Emotion Recognition Using Conditional-DCGAN-Based Data Augmentation

Graph Neural Network-Based Speech Emotion Recognition: A Fusion of Skip Graph Convolutional Networks and Graph Attention Networks

CNN-Based Models for Emotion and Sentiment Analysis Using Speech Data

Research on Speech Emotion Recognition Method Based on ResSE_CNN1D

Emotion Classification from Speech Waveform Using Machine Learning and Deep Learning Techniques

Classification of Infant Crying Sounds Using SE-ResNet-Transformer

ESERNet: Learning spectrogram structure relationship for effective speech emotion recognition with swin transformer in classroom discourse analysis

DSTM: A transformer-based model with dynamic-static feature fusion in speech emotion recognition

Graph-based multi-Feature fusion method for speech emotion recognition

Data augmentation using a 1D-CNN model with MFCC/MFMC features for speech emotion recognition

Multimodal speech emotion recognition optimization using genetic algorithm