Emotional Speech Research Articles

The use of speech-based solutions is an appealing alternative to communicate in human-robot interaction (HRI). An important challenge in this area is processing distant speech which is often noisy, and affected by reverberation and time-varying acoustic channels. It is important to investigate effective speech solutions, especially in dynamic environments where the robots and the users move, changing the distance and orientation between a speaker and the microphone. This paper addresses this problem in the context of speech emotion recognition (SER), which is an important task to understand the intention of the message and the underlying mental state of the user. We propose a novel setup with a PR2 robot that moves as target speech and ambient noise are simultaneously recorded. Our study not only analyzes the detrimental effect of distance speech in this dynamic robot-user setting for speech emotion recognition but also provides solutions to attenuate its effect. We evaluate the use of two beamforming schemes to spatially filter the speech signal using either delay-and-sum (D&S) or minimum variance distortionless response (MVDR). We consider the original training speech recorded in controlled situations, and simulated conditions where the training utterances are processed to simulate the target acoustic environment. We consider the case where the robot is moving (dynamic case) and not moving (static case). For speech emotion recognition, we explore two state-of-the-art classifiers using hand-crafted features implemented with the ladder network strategy and learned features implemented with the wav2vec 2.0 feature representation. MVDR led to a signal-to-noise ratio higher than the basic D&S method. However, both approaches provided very similar average concordance correlation coefficient (CCC) improvements equal to 116 % with the HRI subsets using the ladder network trained with the original MSP-Podcast training utterances. For the wav2vec 2.0-based model, only D&S led to improvements. Surprisingly, the static and dynamic HRI testing subsets resulted in a similar average concordance correlation coefficient. Finally, simulating the acoustic environment in the training dataset provided the highest average concordance correlation coefficient scores with the HRI subsets that are just 29 % and 22 % lower than those obtained with the original training/testing utterances, with ladder network and wav2vec 2.0, respectively.

With the expansion of Speech Emotion Recognition in the consumer domain, several devices, particularly those designed for managing smart home personal assistants for the elderly, have been widely available on the market. The increasing processing power and connection, together with the growing need to facilitate longer residency through technological interventions, highlight the potential benefits of smart home assistants. Enabling these assistants to recognize human emotions would greatly improve user-assistant communication, allowing the assistant to deliver more constructive and customized feedback to the user. In this research work, Modeling and Sentiment Analysis of Social Relationships in Elderly Smart Homes Based on Graph Neural Networks (SASR-MBHNN-BBOA) is proposed. The input data are collected from Social Recommendation Dataset. Then, input data are pre-processed utilizing Inverse Optimal Safety Filters (IOSF) for cleaning the data and removing the background noise. Then the pre-processed data are given to Memristive Bi-neuron Hopfield Neural Network (MBHNN) for predicting the sentiments like positive, negative and neutral. In general, MBHNN doesn’t express some adoption of optimization approaches for determining optimal parameters to predicting the sentiments accurately. Hence BBOA is proposed to optimize MBHNN classifier which precisely predicts the sentiments in elderly smart home. The proposed SASR-MBHNN-BBOA method is implemented in Python, and it assessed with numerous performance metrics such as accuracy, precision, recall, F1-score, ROC. The outcomes show SASR-MBHNN-BBOA attains 20.8%, 19.5%, and 29.6% higher Accuracy, 28.8%, 22.5%, and 32.6% higher Precision, 15.5%, 27.4%, and 18.2% higher Recall are analysed with existing methods such as, Emotional speech analysis in real time for smart home assistants.(SASR-CNN-SHA), Machine Learning to Investigate Elderly Care Requirements in China via the Lens of Family Caregivers (SASR-ML-IECR),Identifying User Emotions via Audio Conversations with Smart Assistants (SASR-DNN-EASA) methods respectively.

Emotional Speech Research Articles

Related Topics

Articles published on Emotional Speech

EmoBone: A Multinational Audio Dataset of Emotional Bone Conducted Speech

Self-supervised Learning for Speech Emotion Recognition Task Using Audio-visual Features and Distil Hubert Model on BAVED and RAVDESS Databases

Speech Emotion Recognition Using a Multi-Time-Scale Approach to Feature Aggregation and an Ensemble of SVM Classifiers

Emotion recognition for human–computer interaction using high-level descriptors

Speech Emotion Recognition

“Love looks not with the eyes”: supranormal processing of emotional speech in individuals with late-blindness versus preserved processing in individuals with congenital-blindness

Speech emotion recognition in real static and dynamic human-robot interaction scenarios

Bi-Modal Bi-Task Emotion Recognition Based on Transformer Architecture

Speech Emotion Recognition Based on Temporal-Spatial Learnable Graph Convolutional Neural Network

Speech emotion recognition systems and their security aspects

A Multimodal Fusion Behaviors Estimation Method for Public Dangerous Monitoring

Speech Emotion Recognition using Machine Learning

Age-related differences in processing of emotions in speech disappear with babble noise in the background

Joint enhancement and classification constraints for noisy speech emotion recognition

Real-time Speech Emotion Recognition using Machine Learning

Modeling and Sentiment Analysis of Social Relationships in Elderly Smart Homes Based on Graph Neural Networks

A State-of-arts Review of Deep Learning Techniques for Speech Emotion Recognition

Chinese Emotional Speech Audiometry Project (CESAP): Establishment and Validation of a New Material Set With Emotionally Neutral Disyllabic Words.

Speech Emotion Recognition Based on Machine Learning

Enhancing speech emotion recognition through deep learning and handcrafted feature fusion

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Emotional Speech Research Articles

Related Topics

Articles published on Emotional Speech

EmoBone: A Multinational Audio Dataset of Emotional Bone Conducted Speech

Self-supervised Learning for Speech Emotion Recognition Task Using Audio-visual Features and Distil Hubert Model on BAVED and RAVDESS Databases

Speech Emotion Recognition Using a Multi-Time-Scale Approach to Feature Aggregation and an Ensemble of SVM Classifiers

Emotion recognition for human–computer interaction using high-level descriptors

Speech Emotion Recognition

“Love looks not with the eyes”: supranormal processing of emotional speech in individuals with late-blindness versus preserved processing in individuals with congenital-blindness

Speech emotion recognition in real static and dynamic human-robot interaction scenarios

Bi-Modal Bi-Task Emotion Recognition Based on Transformer Architecture

Speech Emotion Recognition Based on Temporal-Spatial Learnable Graph Convolutional Neural Network

Speech emotion recognition systems and their security aspects

A Multimodal Fusion Behaviors Estimation Method for Public Dangerous Monitoring

Speech Emotion Recognition using Machine Learning

Age-related differences in processing of emotions in speech disappear with babble noise in the background

Joint enhancement and classification constraints for noisy speech emotion recognition

Real-time Speech Emotion Recognition using Machine Learning

Modeling and Sentiment Analysis of Social Relationships in Elderly Smart Homes Based on Graph Neural Networks

A State-of-arts Review of Deep Learning Techniques for Speech Emotion Recognition

Chinese Emotional Speech Audiometry Project (CESAP): Establishment and Validation of a New Material Set With Emotionally Neutral Disyllabic Words.

Speech Emotion Recognition Based on Machine Learning

Enhancing speech emotion recognition through deep learning and handcrafted feature fusion