Unweighted Accuracy Research Articles

Speech emotion recognition (SER) systems have become essential in various fields, including intelligent healthcare, customer service, call centers, automatic translation systems, and human–computer interaction. However, current approaches predominantly rely on single frame-level or utterance-level features, offering only shallow or deep characterization, and fail to fully exploit the diverse types, levels, and scales of emotion features. The limited ability of single features to capture speech emotion information, along with the ineffective combination of different features’ complementary advantages through simple fusion, pose significant challenges. To address these issues, this paper presents a novel spatio-temporal representation learning enhanced speech emotion recognition with multi-head attention mechanisms(STRL-SER). The proposed technique integrates fine-grained frame-level features and coarse-grained utterance-level emotion features, while employing separate modules to extract deep representations at different levels. In the frame-level module, we introduce parallel networks and utilize a bidirectional long short-term memory network (BiLSTM) and an attention-based multi-scale convolutional neural network (CNN) to capture the spatio-temporal representation details of diverse frame-level signals. Consequently, we extract deep representations of utterance-level features to effectively learn global speech emotion features. To leverage the advantages of different feature types, we introduce a multi-head attention mechanism that fuses the deep representations from various levels. This fusion approach retains the distinctive qualities of each feature type. Finally, we employ segment-level multiplexed decision making to generate the ultimate classification results. We evaluate the effectiveness of our proposed method on two widely recognized benchmark datasets: IEMOCAP and RAVDESS. The results demonstrate that our method achieves notable performance improvements compared to previous studies. On the IEMOCAP dataset, our method achieves a weighted accuracy (WA) of 81.60% and an unweighted accuracy (UA) of 79.32%. Similarly, on the RAVDESS dataset, we achieve a WA of 88.88% and a UA of 87.85%. These outcomes confirm the substantial advancements realized by our proposed method.

Read full abstract

Atherosclerotic cardiovascular disease (ASCVD), which includes coronary heart disease (CHD) and ischemic stroke, is the leading cause of mortality globally. According to the European Society of Cardiology (ESC), 26 million people worldwide have heart disease, with 3.6 million diagnosed each year. Early detection of heart disease will aid in lowering the mortality rate. The lack of diversity in training data and the difficulty in comprehending the findings of complicated AI models are the key issues in current research for heart disease prediction using artificial intelligence. To overcome this, in this paper, cardiac disease prediction using AI algorithms with SelectKBest has been proposed. Features are standardized, balanced, and selected using the StandardScaler, SMOTE, and SelectKBest techniques. Machine learning models such as support vector machine (SVM), K-nearest neighbor(KNN), decision tree (DT), logistic regression (LR), adaptive boosting (AB), naive Bayes (NB), random forest (RF), and extra tree (ET) and deep learning models such as vanilla long short-term memory (LSTM), bidirectional long short-term memory (LSTM), stacked long short-term memory (LSTM), and deep neural network (DNN) are assessed using Alizadeh Sani, combined (Cleveland, Hungarian, Switzerland, Long Beach VA, and Stalog), and Pakistan heart failure datasets. As a result of the evaluation, the proposed deep neural network (DNN) with SelectKBest predicted heart disease in a promising way. The prediction rate of unweighted accuracy of 99% on Alizadeh Sani, 98% on combined, and 97% on Pakistan are gained in tenfold cross-validation experiments. The suggested approach can be utilized to diagnose heart disease in its early stages.

Read full abstract

Unweighted Accuracy Research Articles

Related Topics

Articles published on Unweighted Accuracy

Speech Emotion Recognition Using Transfer Learning: Integration of Advanced Speaker Embeddings and Image Recognition Models

Multimodal Information Fusion and Data Generation for Evaluation of Second Language Emotional Expression

Multimodal fusion: A study on speech-text emotion recognition with the integration of deep learning

Multi-task transfer learning for the prediction of entity modifiers in clinical text: application to opioid use disorder case detection

Speech Emotion Recognition Using Dual-Stream Representation and Cross-Attention Fusion

Bi-Modal Bi-Task Emotion Recognition Based on Transformer Architecture

MVIB-DVA: Learning minimum sufficient multi-feature speech emotion embeddings under dual-view aware

Speech emotion recognition based on multi-dimensional feature extraction and multi-scale feature fusion

Spatio-temporal representation learning enhanced speech emotion recognition with multi-head attention mechanisms

Speech emotion recognition based on Graph-LSTM neural network

A BiLSTM–Transformer and 2D CNN Architecture for Emotion Recognition from Speech

Cardiac disease prediction using AI algorithms with SelectKBest.

Semi-supervised cross-lingual speech emotion recognition

Research on Speech Emotion Recognition Based on Teager Energy Operator Coefficients and Inverted MFCC Feature Fusion

Emotion Recognition using Deep Learning

Speech emotion recognition based on syllable-level feature extraction

Speech Emotion Recognition Using Cascaded Attention Network with Joint Loss for Discrimination of Confusions

Multi-Level Attention-Based Categorical Emotion Recognition Using Modulation-Filtered Cochleagram

Speech emotion recognition based on emotion perception

TC-Net: A Modest & Lightweight Emotion Recognition System Using Temporal Convolution Network

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Unweighted Accuracy Research Articles

Related Topics

Articles published on Unweighted Accuracy

Speech Emotion Recognition Using Transfer Learning: Integration of Advanced Speaker Embeddings and Image Recognition Models

Multimodal Information Fusion and Data Generation for Evaluation of Second Language Emotional Expression

Multimodal fusion: A study on speech-text emotion recognition with the integration of deep learning

Multi-task transfer learning for the prediction of entity modifiers in clinical text: application to opioid use disorder case detection

Speech Emotion Recognition Using Dual-Stream Representation and Cross-Attention Fusion

Bi-Modal Bi-Task Emotion Recognition Based on Transformer Architecture

MVIB-DVA: Learning minimum sufficient multi-feature speech emotion embeddings under dual-view aware

Speech emotion recognition based on multi-dimensional feature extraction and multi-scale feature fusion

Spatio-temporal representation learning enhanced speech emotion recognition with multi-head attention mechanisms

Speech emotion recognition based on Graph-LSTM neural network

A BiLSTM–Transformer and 2D CNN Architecture for Emotion Recognition from Speech

Cardiac disease prediction using AI algorithms with SelectKBest.

Semi-supervised cross-lingual speech emotion recognition

Research on Speech Emotion Recognition Based on Teager Energy Operator Coefficients and Inverted MFCC Feature Fusion

Emotion Recognition using Deep Learning

Speech emotion recognition based on syllable-level feature extraction

Speech Emotion Recognition Using Cascaded Attention Network with Joint Loss for Discrimination of Confusions

Multi-Level Attention-Based Categorical Emotion Recognition Using Modulation-Filtered Cochleagram

Speech emotion recognition based on emotion perception

TC-Net: A Modest &amp; Lightweight Emotion Recognition System Using Temporal Convolution Network

TC-Net: A Modest & Lightweight Emotion Recognition System Using Temporal Convolution Network