• All Solutions All Solutions Caret
    • Editage

      One platform for all researcher needs

    • Paperpal

      AI-powered academic writing assistant

    • R Discovery

      Your #1 AI companion for literature search

    • Mind the Graph

      AI tool for graphics, illustrations, and artwork

    • Journal finder

      AI-powered journal recommender

    Unlock unlimited use of all AI tools with the Editage Plus membership.

    Explore Editage Plus
  • Support All Solutions Support
    discovery@researcher.life
Discovery Logo
Sign In
Paper
Search Paper
Cancel
Pricing Sign In
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Chat PDF iconChat PDF Star Left icon
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
Discovery Logo menuClose menu
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Chat PDF iconChat PDF Star Left icon
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link

Related Topics

  • Speech Emotion Recognition System
  • Speech Emotion Recognition System
  • Emotion Recognition System
  • Emotion Recognition System
  • Emotional Speech Database
  • Emotional Speech Database
  • Emotion Recognition
  • Emotion Recognition
  • Affective Computing
  • Affective Computing

Articles published on Emotional Speech

Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
2553 Search results
Sort by
Recency
  • New
  • Open Access Icon
  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.ijcce.2024.11.008
Speech Emotion Recognition Algorithm of Intelligent Robot Based on ACO-SVM
  • Dec 1, 2025
  • International Journal of Cognitive Computing in Engineering
  • Xueliang Kang

Speech Emotion Recognition Algorithm of Intelligent Robot Based on ACO-SVM

  • New
  • Research Article
  • 10.1016/j.engappai.2025.112152
Temporal-frequency joint hierarchical transformer with dynamic windows for speech emotion recognition
  • Dec 1, 2025
  • Engineering Applications of Artificial Intelligence
  • Yonghong Fan + 3 more

Temporal-frequency joint hierarchical transformer with dynamic windows for speech emotion recognition

  • New
  • Research Article
  • 10.1016/j.asoc.2025.113915
Improving speech emotion recognition using gated cross-modal attention and multimodal homogeneous feature discrepancy learning
  • Dec 1, 2025
  • Applied Soft Computing
  • Feng Li + 2 more

Improving speech emotion recognition using gated cross-modal attention and multimodal homogeneous feature discrepancy learning

  • New
  • Research Article
  • 10.1016/j.artmed.2025.103279
TSFNet: A Temporal-Spectral Fusion Network for advanced speech emotion recognition in medical applications.
  • Dec 1, 2025
  • Artificial intelligence in medicine
  • Xinran Li + 5 more

TSFNet: A Temporal-Spectral Fusion Network for advanced speech emotion recognition in medical applications.

  • New
  • Research Article
  • 10.1016/j.measurement.2025.118165
A filtering approach for speech emotion recognition using wavelet approximation coefficient
  • Dec 1, 2025
  • Measurement
  • Ravi + 1 more

A filtering approach for speech emotion recognition using wavelet approximation coefficient

  • New
  • Research Article
  • 10.1016/j.ins.2025.122956
EmoDim: An independent dimensional contrastive learning with pseudo-labeling for speech emotion recognition
  • Dec 1, 2025
  • Information Sciences
  • Bao Thang Ta + 3 more

EmoDim: An independent dimensional contrastive learning with pseudo-labeling for speech emotion recognition

  • New
  • Research Article
  • 10.1016/j.apacoust.2025.110905
Multilingual speech emotion recognition using IGRFXG – Ensemble feature selection approach
  • Dec 1, 2025
  • Applied Acoustics
  • Astha Tripathi + 1 more

Multilingual speech emotion recognition using IGRFXG – Ensemble feature selection approach

  • New
  • Research Article
  • 10.1016/j.eswa.2025.128605
Bimodal speech emotion recognition via contrastive self-alignment learning
  • Dec 1, 2025
  • Expert Systems with Applications
  • Chang Wang + 3 more

Bimodal speech emotion recognition via contrastive self-alignment learning

  • New
  • Research Article
  • 10.54097/bk7f6783
An Analysis of AI-Based Emotional Recognition: Main Methods Based on Four Modalities
  • Nov 27, 2025
  • Academic Journal of Science and Technology
  • Tian Jing

Artificial Intelligence (AI) has become a useful tool in human emotion recognition, with a broad application range. To better cater for the applications, numerous researches are conducted, helping developing the related technologies rapidly. This review broadly explores the main methods in emotion recognition based on AI. It begins with facial emotion recognition (FER), analyzing its general working flow (from constructing database to preprocessing to extracting features to machine learning). It is seen in the following that this flow commonly applies to other three modalities. Then, speech emotion recognition (SER) is briefly discussed, mainly on its feature extraction and classification (classification is a part of machine learning). Subsequently, emotion recognition from physiological signals is deeply explored, due to its passive nature and resistance to artificial control. Among a variety of physiological signals, the review concentrates on electroencephalographic (EEG) and electrocardiographic (ECG) signals. Afterwards, textual emotion recognition (TER) is roughly introduced, outlining four basic methods based on it. Finally, the review concludes the challenges which occur to nearly every experiment regarding emotion recognition. Additionally, the strengths and limitations of each modality are presented in the discussion module. The highlight of the review is that it provides a systematic analysis of basic methods of using AI to recognize emotion.

  • New
  • Research Article
  • 10.4218/etrij.2025-0058
Mix‐MaxETTS: A text‐to‐emotional speech synthesis model based on a deep encoder–decoder structure for the transfer of secondary emotions
  • Nov 26, 2025
  • ETRI Journal
  • Seyyed Mahdi Hassani + 1 more

Abstract Given the importance of emotions in social interactions, emotional speech synthesis has attracted significant attention in the field of human–computer interaction. Remarkable advancements have been made in emotional text‐to‐speech synthesis, but most previous studies have concentrated on imitating styles associated with a specific primary emotion, neglecting secondary emotions that arise from mixtures of primary emotions. Therefore, there is a need to leverage both primary and secondary emotions in speech synthesis to facilitate more engaging, realistic, and natural interactions among artificial social agents. To address this gap, we propose a text‐to‐emotional speech synthesis model designed to generate nuanced mixtures of emotions that effectively convey secondary emotions during interactions. By adjusting the values of each basic emotion, we can control the mix of emotions in the synthetic speech. Our proposed method distinguishes between primary emotions and variations in mixed emotions while learning emotional styles. The effectiveness of the proposed framework was validated through both objective and subjective evaluations.

  • New
  • Research Article
  • 10.3390/e27121201
Quantum AI in Speech Emotion Recognition
  • Nov 26, 2025
  • Entropy
  • Michael Norval + 1 more

We evaluate a hybrid quantum–classical pipeline for speech emotion recognition (SER) on a custom Afrikaans corpus using MFCC-based spectral features with pitch and energy variants, explicitly comparing three quantum approaches—a variational quantum classifier (VQC), a quantum support vector machine (QSVM), and a Quantum Approximate Optimisation Algorithm (QAOA)-based classifier—against a CNN–LSTM (CLSTM) baseline. We detail the classical-to-quantum data encoding (angle embedding with bounded rotations and an explicit feature-to-qubit map) and report test accuracy, weighted precision, recall, and F1. Under ideal analytic simulation, the quantum models reach 41–43% test accuracy; under a realistic 1% NISQ noise model (100–1000 shots) this degrades to 34–40%, versus 73.9% for the CLSTM baseline. Despite the markedly lower empirical accuracy—expected in the NISQ era—we provide an end-to-end, noise-aware hybrid SER benchmark and discuss the asymptotic advantages of quantum subroutines (Chebyshev-based quantum singular value transformation, quantum walks, and block encoding) that become relevant only in the fault-tolerant regime.

  • New
  • Research Article
  • 10.1038/s41598-025-25874-9
Improving emotional connection of human and machine using Deep Maxout Networks optimized through Modified Water Cycle optimizer.
  • Nov 25, 2025
  • Scientific reports
  • Jun Zhao + 2 more

The precise identification and understanding of human emotions by computers is crucial for generating natural interactions between humans and machines. This research presents a novel approach for identifying emotions in speech through the integration of deep learning and metaheuristic techniques. The approach utilizes Deep Maxout Networks (DMN) as the primary framework and enhances it using the modified version of the Water Cycle Algorithm (MWCA). The MWCA enhances the architectural parameters of the DMN and optimizes its capability to recognize emotions from speech signals. The suggested model employs Mel-Frequency Cepstral Coefficients (MFCC) to extract features from speech input, which can enable effective differentiation between numerous emotional states. The efficiency of the model has been assessed using two datasets, CASIA and Emo-DB, achieving an average accuracy of 93.1% and an F1-score of 92.4% on Emo-DB, outperforming baseline models with statistically significant improvements (p < 0.01). This research helps the domain of emotional interaction design by providing a robust tool for computers to understand and react to the emotions of users, and finally improves the general experience of users.

  • New
  • Research Article
  • 10.1038/s41598-025-28686-z
FedSER-XAI: PSO-optimized multi-stream cross-attention transformer with graph features for explainable federated speech emotion recognition.
  • Nov 24, 2025
  • Scientific reports
  • Eman Abdulrahman Alkhamali + 3 more

Federated learning for speech emotion recognition faces fundamental challenges in simultaneously achieving high performance, privacy preservation, and model interpretability. This paper introduces FedSER-XAI, a novel framework that integrates Particle Swarm Optimization (PSO)-based feature selection, multi-stream cross-attention mechanisms, and graph-based feature extraction within an explainable federated learning architecture. Our approach combines Vision Transformer processing of mel-spectrograms with temporal-spatial graph convolutional networks to capture both contextual and structural speech relationships. The PSO algorithm achieves 78.1% dimensionality reduction (228→50 features) while improving discriminative power. The multi-stream architecture processes traditional acoustic features alongside novel graph-based representations derived from visibility and correlation graphs, fused through Transformer-based cross-attention mechanisms. Extensive evaluation on EMODB and SAVEE datasets demonstrates exceptional performance: 99.9% and 97.2% accuracy in centralized settings, with remarkable federated performance achieving global model accuracies of 99.7% (EMODB) and 97.2% (SAVEE) across 8 emotion-specialized clients, representing only 0.2% and 0.0% degradation compared to centralized training. The framework achieves rapid convergence within 10 communication rounds, representing minimal performance degradation (0.2% for EMODB) while preserving privacy. Cross-dataset evaluation on CREMA-D yields 68% accuracy, demonstrating reasonable generalization. The comprehensive explainability framework using SHAP and LIME provides global and local interpretations, validating that graph-based features contribute significantly to emotion discrimination. FedSER-XAI represents the first explainable federated speech emotion recognition system, advancing trustworthy AI for sensitive healthcare and human-computer interaction applications.

  • New
  • Research Article
  • Cite Count Icon 1
  • 10.1038/s41598-025-27871-4
Representation learning with parameterised quantum circuits for advancing speech emotion recognition.
  • Nov 22, 2025
  • Scientific reports
  • Thejan Rajapakshe + 4 more

Quantum machine learning (QML) offers a promising avenue for advancing representation learning in complex signal domains. In this study, we investigate the use of parameterised quantum circuits (PQCs) for speech emotion recognition (SER)-a challenging task due to the subtle temporal variations and overlapping affective states in vocal signals. We propose a hybrid quantum-classical architecture that integrates PQCs into a conventional convolutional neural network (CNN), leveraging quantum properties such as superposition and entanglement to enrich emotional feature representations. Experimental evaluations on three benchmark datasets IEMOCAP, RECOLA, and MSP-IMPROV-demonstrate that our hybrid model achieves improved classification performance relative to a purely classical CNN baseline, with over 50% reduction in trainable parameters. Furthermore, Adjusted Rand Index (ARI) analysis demonstrates that the quantum model yields feature representations with improved alignment to true emotion classes compared with the classical model, reinforcing the observed performance gains. This work provides early evidence of the potential for QML to enhance emotion recognition and lays the foundation for future quantum-enabled affective computing systems.

  • New
  • Research Article
  • 10.1007/s00034-025-03410-4
Assessing the Effectiveness of Feature Normalization and Dataset Quality in Speech Emotion Recognition Across Diverse Emotional and Linguistic Contexts
  • Nov 16, 2025
  • Circuits, Systems, and Signal Processing
  • Swapna Mol George + 1 more

Assessing the Effectiveness of Feature Normalization and Dataset Quality in Speech Emotion Recognition Across Diverse Emotional and Linguistic Contexts

  • Research Article
  • 10.1007/s00034-025-03408-y
A Multi-branch Interactive Attention Network Based on Self-Distillation for Speech Emotion Recognition
  • Nov 15, 2025
  • Circuits, Systems, and Signal Processing
  • Yuanyuan Wei + 4 more

A Multi-branch Interactive Attention Network Based on Self-Distillation for Speech Emotion Recognition

  • Research Article
  • 10.54097/hq3rzm08
Review Of Emotion Recognition Technology Methods Based on Deep Recognition
  • Nov 13, 2025
  • Academic Journal of Science and Technology
  • Zhizhou Lu

With the rapid advancement of artificial intelligence, sensor technology, and big data analytics, the way humans interact with machines is shifting from an "instruction-based" mode to a "perception-based" one. As emotions are the core driving force behind human behaviors and decisions, the objective and quantitative detection of emotions has become a focus of interdisciplinary research. Studies have shown that emotions can be detected and analyzed through four methods: facial expression recognition, speech emotion recognition, text sentiment analysis, and physiological signal recognition. Reducing errors in emotion recognition is of great help to the development and widespread application of human-computer interaction. The development and research on emotion detection is not only an inevitable outcome of technological development but also a key to addressing practical needs. This paper will analyze deep learning models such as CNN, LSTM, and SENN, and summarize their advantages and disadvantages. It breaks the traditional perception that "emotions cannot be quantified," enabling machines to move from "understanding language" to "understanding the human heart," and ultimately promoting the harmonious coexistence of technology and human society.

  • Research Article
  • 10.3390/bdcc9110285
Cross-Lingual Bimodal Emotion Recognition with LLM-Based Label Smoothing
  • Nov 12, 2025
  • Big Data and Cognitive Computing
  • Elena Ryumina + 4 more

Bimodal emotion recognition based on audio and text is widely adopted in video-constrained real-world applications such as call centers and voice assistants. However, existing systems suffer from limited cross-domain generalization and monolingual bias. To address these limitations, a cross-lingual bimodal emotion recognition method is proposed, integrating Mamba-based temporal encoders for audio (Wav2Vec2.0) and text (Jina-v3) with a Transformer-based cross-modal fusion architecture (BiFormer). Three corpus-adaptive augmentation strategies are introduced: (1) Stacked Data Sampling, in which short utterances are concatenated to stabilize sequence length; (2) Label Smoothing Generation based on Large Language Model, where the Qwen3-4B model is prompted to detect subtle emotional cues missed by annotators, producing soft labels that reflect latent emotional co-occurrences; and (3) Text-to-Utterance Generation, in which emotionally labeled utterances are generated by ChatGPT-5 and synthesized into speech using the DIA-TTS model, enabling controlled creation of affective audio–text pairs without human annotation. BiFormer is trained jointly on the English Multimodal EmotionLines Dataset and the Russian Emotional Speech Dialogs corpus, enabling cross-lingual transfer without parallel data. Experimental results show that the optimal data augmentation strategy is corpus-dependent: Stacked Data Sampling achieves the best performance on short, noisy English utterances, while Label Smoothing Generation based on Large Language Model better captures nuanced emotional expressions in longer Russian utterances. Text-to-Utterance Generation does not yield a measurable gain due to current limitations in expressive speech synthesis. When combined, the two best performing strategies produce complementary improvements, establishing new state-of-the-art performance in both monolingual and cross-lingual settings.

  • Research Article
  • 10.1007/s10586-025-05830-y
Automatic speech emotion recognition for arabic dialects: a new dataset and machine learning framework
  • Nov 11, 2025
  • Cluster Computing
  • Zineddine Sarhani Kahhoul + 6 more

Automatic speech emotion recognition for arabic dialects: a new dataset and machine learning framework

  • Research Article
  • 10.3390/fi17110509
Frame and Utterance Emotional Alignment for Speech Emotion Recognition
  • Nov 5, 2025
  • Future Internet
  • Seounghoon Byun + 1 more

Speech Emotion Recognition (SER) is important for applications such as Human–Computer Interaction (HCI) and emotion-aware services. Traditional SER models rely on utterance-level labels, aggregating frame-level representations through pooling operations. However, emotional states can vary across frames within an utterance, making it difficult for models to learn consistent and robust representations. To address this issue, we propose two auxiliary loss functions, Emotional Attention Loss (EAL) and Frame-to-Utterance Alignment Loss (FUAL). The proposed approach uses a Classification token (CLS) self-attention pooling mechanism, where the CLS summarizes the entire utterance sequence. EAL encourages frames of the same emotion to align closely with the CLS while separating frames of different classes, and FUAL enforces consistency between frame-level and utterance-level predictions to stabilize training. Model training proceeds in two stages: Stage 1 fine-tunes the wav2vec 2.0 backbone with Cross-Entropy (CE) loss to obtain stable frame embeddings, and stage 2 jointly optimizes CE, EAL and FUAL within the CLS-based pooling framework. Experiments on the IEMOCAP four-class dataset demonstrate that our method consistently outperforms baseline models, showing that the proposed losses effectively address representation inconsistencies and improve SER performance. This work advances Artificial Intelligence by improving the ability of models to understand human emotions through speech.

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • .
  • .
  • .
  • 10
  • 1
  • 2
  • 3
  • 4
  • 5

Popular topics

  • Latest Artificial Intelligence papers
  • Latest Nursing papers
  • Latest Psychology Research papers
  • Latest Sociology Research papers
  • Latest Business Research papers
  • Latest Marketing Research papers
  • Latest Social Research papers
  • Latest Education Research papers
  • Latest Accounting Research papers
  • Latest Mental Health papers
  • Latest Economics papers
  • Latest Education Research papers
  • Latest Climate Change Research papers
  • Latest Mathematics Research papers

Most cited papers

  • Most cited Artificial Intelligence papers
  • Most cited Nursing papers
  • Most cited Psychology Research papers
  • Most cited Sociology Research papers
  • Most cited Business Research papers
  • Most cited Marketing Research papers
  • Most cited Social Research papers
  • Most cited Education Research papers
  • Most cited Accounting Research papers
  • Most cited Mental Health papers
  • Most cited Economics papers
  • Most cited Education Research papers
  • Most cited Climate Change Research papers
  • Most cited Mathematics Research papers

Latest papers from journals

  • Scientific Reports latest papers
  • PLOS ONE latest papers
  • Journal of Clinical Oncology latest papers
  • Nature Communications latest papers
  • BMC Geriatrics latest papers
  • Science of The Total Environment latest papers
  • Medical Physics latest papers
  • Cureus latest papers
  • Cancer Research latest papers
  • Chemosphere latest papers
  • International Journal of Advanced Research in Science latest papers
  • Communication and Technology latest papers

Latest papers from institutions

  • Latest research from French National Centre for Scientific Research
  • Latest research from Chinese Academy of Sciences
  • Latest research from Harvard University
  • Latest research from University of Toronto
  • Latest research from University of Michigan
  • Latest research from University College London
  • Latest research from Stanford University
  • Latest research from The University of Tokyo
  • Latest research from Johns Hopkins University
  • Latest research from University of Washington
  • Latest research from University of Oxford
  • Latest research from University of Cambridge

Popular Collections

  • Research on Reduced Inequalities
  • Research on No Poverty
  • Research on Gender Equality
  • Research on Peace Justice & Strong Institutions
  • Research on Affordable & Clean Energy
  • Research on Quality Education
  • Research on Clean Water & Sanitation
  • Research on COVID-19
  • Research on Monkeypox
  • Research on Medical Specialties
  • Research on Climate Justice
Discovery logo
FacebookTwitterLinkedinInstagram

Download the FREE App

  • Play store Link
  • App store Link
  • Scan QR code to download FREE App

    Scan to download FREE App

  • Google PlayApp Store
FacebookTwitterTwitterInstagram
  • Universities & Institutions
  • Publishers
  • R Discovery PrimeNew
  • Ask R Discovery
  • Blog
  • Accessibility
  • Topics
  • Journals
  • Open Access Papers
  • Year-wise Publications
  • Recently published papers
  • Pre prints
  • Questions
  • FAQs
  • Contact us
Lead the way for us

Your insights are needed to transform us into a better research content provider for researchers.

Share your feedback here.

FacebookTwitterLinkedinInstagram
Cactus Communications logo

Copyright 2025 Cactus Communications. All rights reserved.

Privacy PolicyCookies PolicyTerms of UseCareers