• All Solutions All Solutions Caret
    • Editage

      One platform for all researcher needs

    • Paperpal

      AI-powered academic writing assistant

    • R Discovery

      Your #1 AI companion for literature search

    • Mind the Graph

      AI tool for graphics, illustrations, and artwork

    • Journal finder

      AI-powered journal recommender

    Unlock unlimited use of all AI tools with the Editage Plus membership.

    Explore Editage Plus
  • Support All Solutions Support
    discovery@researcher.life
Discovery Logo
Sign In
Paper
Search Paper
Cancel
Pricing Sign In
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Chat PDF iconChat PDF Star Left icon
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
Discovery Logo menuClose menu
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Chat PDF iconChat PDF Star Left icon
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link

Speech Enhancement Research Articles

  • Share Topic
  • Share on Facebook
  • Share on Twitter
  • Share on Mail
  • Share on SimilarCopy to clipboard
Follow Topic R Discovery
By following a topic, you will receive articles in your feed and get email alerts on round-ups.
Overview
2884 Articles

Published in last 50 years

Related Topics

  • Speech Enhancement Algorithm
  • Speech Enhancement Algorithm
  • Speech Enhancement System
  • Speech Enhancement System

Articles published on Speech Enhancement

Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
2823 Search results
Sort by
Recency
  • New
  • Research Article
  • 10.1016/j.neunet.2025.107805
Lightweight real-time speech enhancement: State-space models and multi-spectral scanning techniques.
  • Nov 1, 2025
  • Neural networks : the official journal of the International Neural Network Society
  • Xiaodong Zhu + 4 more

Lightweight real-time speech enhancement: State-space models and multi-spectral scanning techniques.

  • New
  • Research Article
  • 10.1016/j.ijmedinf.2025.106029
Deep learning-based in-ambulance speech recognition and generation of prehospital emergency diagnostic summaries using LLMs.
  • Nov 1, 2025
  • International journal of medical informatics
  • Chen Chen + 8 more

Deep learning-based in-ambulance speech recognition and generation of prehospital emergency diagnostic summaries using LLMs.

  • New
  • Research Article
  • 10.1016/j.specom.2025.103314
LORT: Locally refined convolution and Taylor transformer for monaural speech enhancement
  • Nov 1, 2025
  • Speech Communication
  • Junyu Wang + 5 more

LORT: Locally refined convolution and Taylor transformer for monaural speech enhancement

  • New
  • Research Article
  • 10.1016/j.apacoust.2025.110844
Exploiting lightweight neural post-filtering for directional speech enhancement
  • Nov 1, 2025
  • Applied Acoustics
  • Tianchi Sun + 5 more

Exploiting lightweight neural post-filtering for directional speech enhancement

  • New
  • Research Article
  • 10.3390/math13213481
Enhancing the MUSE Speech Enhancement Framework with Mamba-Based Architecture and Extended Loss Functions
  • Oct 31, 2025
  • Mathematics
  • Tsung-Jung Li + 1 more

We propose MUSE++, an advanced and lightweight speech enhancement (SE) framework that builds upon the original MUSE architecture by introducing three key improvements: a Mamba-based state space model, dynamic SNR-driven data augmentation, and an augmented multi-objective loss function. First, we replace the original multi-path enhanced Taylor (MET) transformer block with the Mamba architecture, enabling substantial reductions in model complexity and parameter count while maintaining robust enhancement capability. Second, we adopt a dynamic training strategy that varies the signal-to-noise ratios (SNRs) across diverse speech samples, promoting improved generalization to real-world acoustic scenarios. Third, we expand the model’s loss framework with additional objective measures, allowing the model to be empirically tuned towards both perceptual and objective SE metrics. Comprehensive experiments conducted on the VoiceBank-DEMAND dataset demonstrate that MUSE++ delivers consistently superior performance across standard evaluation metrics, including PESQ, CSIG, CBAK, COVL, SSNR, and STOI, while reducing the number of model parameters by over 65% compared to the baseline. These results highlight MUSE++ as a highly efficient and effective solution for speech enhancement, particularly in resource-constrained and real-time deployment scenarios.

  • New
  • Research Article
  • 10.3397/in_2025_1074529
Design and development of a planar MEMS microphone array
  • Oct 22, 2025
  • INTER-NOISE and NOISE-CON Congress and Conference Proceedings
  • Damjan Pecioski + 5 more

In the last few years viewing an audio source has become a widely used tool in many fields such as speech recognition and enhancement and especially in noise localization in different environments. Acoustic cameras are widely used as a tool for noise source investigation, however an acoustic image is difficult to achieve in environments with a large amount of noise and reverberation. An effective approach to overcome this is the use of beamforming theory coupled with a microphone array in order to obtain a recording of the desired acoustic signal. The aim of this work is to design and develop a low cost microphone array which can be upgraded with a camera, i.e. acoustic camera, based on digital MEMS microphones and a raspberry pi. Different formations of the microphone array are developed, tested and compared. The details on hardware integration as well as the development environment are discussed in this paper.

  • New
  • Research Article
  • 10.3390/sym17101768
Symmetric Combined Convolution with Convolutional Long Short-Term Memory for Monaural Speech Enhancement
  • Oct 20, 2025
  • Symmetry
  • Yang Xian + 4 more

Deep neural network-based approaches have obtained remarkable progress in monaural speech enhancement. Nevertheless, current cutting-edge approaches remain vulnerable to complex acoustic scenarios. We propose a Symmetric Combined Convolution Network with ConvLSTM (SCCN) for monaural speech enhancement. Specifically, the Combined Convolution Block utilizes parallel convolution branches, including standard convolution and two different depthwise separable convolutions, to reinforce feature extraction in depthwise and channelwise. Similarly, Combined Deconvolution Blocks are stacked to construct the convolutional decoder. Moreover, we introduce the exponentially increasing dilation between convolutional kernel elements in the encoder and decoder, which expands receptive fields. Meanwhile, the grouped ConvLSTM layers are exploited to extract the interdependency of spatial and temporal information. The experimental results demonstrate that the proposed SCCN method obtains on average 86.00% in STOI and 2.43 in PESQ, which outperforms the state-of-the-art baseline methods, confirming the effectiveness in enhancing speech quality.

  • New
  • Research Article
  • 10.1044/2025_aja-25-00059
Verification of an Amplification Strategy to Enhance Soft Speech for Adults With Severe-to-Profound Hearing Loss.
  • Oct 13, 2025
  • American journal of audiology
  • Hsuan Yun Huang + 4 more

This study investigated the effects of a soft speech enhancement algorithm on distant speech perception for adults with severe-to-profound hearing loss (SPHL), examining speech intelligibility, listening effort, and sound quality. Participants were 16 Mandarin-speaking adults (13 men, 3 women; Mage = 58 years) with symmetrical severe-to-profound sensorineural hearing loss. They had at least 1 year of hearing aid experience. A within-subject experimental design compared two hearing aid conditions: with the Speech Enhancer algorithm activated and deactivated. Speech intelligibility was assessed using the Mandarin Chinese matrix sentence test at individual speech reception thresholds. Subjective listening effort was measured using a categorical rating scale for speech presented at three distances (2, 4, and 8 m). Sound quality ratings were collected for loudness, speech understanding, and overall impression using a visual analog scale. Activation of the speech enhancement algorithm led to a notable increase in speech intelligibility from 45% to 67%. Subjective listening effort decreased significantly with the algorithm activated at all distances, with greater benefits observed at farther distances. Similarly, sound quality ratings were significantly higher with the algorithm on for all attributes across all distances, with the largest improvements in overall impression ratings at greater distances. The soft speech enhancement algorithm significantly improved speech intelligibility, reduced listening effort, and enhanced sound quality for distant speech perception in Mandarin-speaking adults with SPHL. These findings suggest that targeted signal processing for soft speech can provide substantial benefits for individuals with SPHL, including speakers of tonal languages, potentially improving communication in challenging listening situations.

  • Research Article
  • 10.48084/etasr.11071
Biorthogonal Wavelet Packet and Adaptive Filters for Noisy Speech Reduction
  • Oct 6, 2025
  • Engineering, Technology & Applied Science Research
  • C Shraddha + 2 more

Minimizing noise in speech signals is crucial for applications such as speech recognition and enhancement. This paper proposes a hybrid technique that combines a biorthogonal wavelet packet with a Recursive Least Squares (RLS) adaptive filter to reduce environmental and colored noise during the preprocessing stage. Simulation results demonstrate a 2–10% improvement in speech signal strength under noisy conditions. The biorthogonal wavelet's vanishing moments and the length of the RLS filter play key roles in preserving speech characteristics while suppressing noise. Performance is evaluated using Signal-to-Noise Ratio (SNR), Mean Squared Error (MSE), and Peak Signal-to-Noise Ratio (PSNR) metrics, showing effective reduction of pink and babble noise across varying decibel levels, thereby ensuring enhancing the clarity of speech for recognition applications.

  • Research Article
  • 10.17485/ijst/v18i35.2678
Development of a Robust Foreground Speech Enhancement Module for Sub-Optimal Data
  • Oct 3, 2025
  • Indian Journal Of Science And Technology
  • Debabrata Gogoi + 1 more

Objectives: This work aims to enhance foreground speech by effectively removing unwanted background noise and recovering the desired signal, utilizing deep learning approaches with limited training data. Methods: This study addresses the above issue using a transfer learning-based technique that uses mel-spectrograms. Specifically, it proposes a transfer learning approach that builds on a pre-trained residual network (based on wav2vec2 model) that includes a statistics pooling layer as used in speaker recognition. The model is then trained using a limited amount of clean and noisy datasets. In addition, we adopt a log mel-spectrogram feature extraction technique to improve the generalization of speech enhancement models. The database used here is from the Noisy Speech Database curated by Valentini-Botinhao, Cassia (2017) and the LibriSpeech corpus. Findings: Using the same dataset, the performances of the baseline model of an autoencoder and a multilayer autoencoder were compared with the proposed model. The proposed approach with an STOI score of 0.88 and an SNR improvement of 3.27 dB, outperforms both the baseline models in subjective and objective evaluation. Novelty: This work eliminates signal truncation, a constraint observed in conventional speech enhancement pipelines, by integrating a statistics pooling layer with a pre-trained wav2vec2-based residual network for variable-length input handling. Furthermore, the model's robustness and flexibility are enhanced by the use of log mel-spectrograms in this context, allowing it to produce state-of-the-art results even with sparse supervised training data. Keywords: Denoise, Mel-spectrogram, Signal Processing, Transfer Learning, Wav2Vec2

  • Research Article
  • 10.17743/jaes.2022.0222
An Automatic Mixing Speech Enhancement System for Information Integrity
  • Oct 3, 2025
  • Journal of the Audio Engineering Society
  • Xiaojing Liu + 2 more

The simultaneous presence of multiple audio signals can lead to information loss due to auditory masking and interference, often resulting in diminished signal clarity. The authors propose a speech enhancement system designed to present multiple tracks of speech information with reduced auditory masking, thereby enabling more effective discernment of multiple simultaneous talkers. The system evaluates auditory masking using the ITU-R BS.1387 Perceptual Evaluation of Audio Quality model along with ideal mask ratio metrics. To achieve optimal results, a combined iterative Harmony Search algorithm and integer optimization are employed, applying audio effects such as level balancing, equalization, dynamic range compression, and spatialization, aimed at minimizing masking. Objective and subjective listening tests demonstrate that the proposed system performs competitively against mixes created by professional sound engineers and surpasses existing automixing systems. This system is applicable in various communication scenarios, including teleconferencing, in-game voice communication, and live streaming.

  • Research Article
  • 10.1121/10.0039557
CtPuLSE: Close-talk, and pseudo-label based far-field, speech enhancement.
  • Oct 1, 2025
  • The Journal of the Acoustical Society of America
  • Zhong-Qiu Wang

The current dominant approach for neural speech enhancement is via purely supervised deep learning on simulated pairs of far-field noisy-reverberant speech (i.e., mixtures) and clean speech. The trained models, however, often exhibit limited generalizability to real-recorded mixtures. To deal with this, this paper investigates training enhancement models directly on real mixtures. However, a major difficulty challenging this approach is that, since the clean speech of real mixtures is unavailable, there lacks a good supervision for real mixtures. In this context, assuming that a training set consisting of real-recorded pairs of close-talk and far-field mixtures is available, we propose to address this difficulty via close-talk speech enhancement, where an enhancement model is first trained on simulated mixtures to enhance real-recorded close-talk mixtures and the estimated close-talk speech can then be utilized as a supervision (i.e., pseudo-label) for training far-field speech enhancement models directly on the paired real-recorded far-field mixtures. We name the proposed system ctPuLSE. Evaluation results on the popular CHiME-4 dataset show that ctPuLSE can derive high-quality pseudo-labels and yield far-field speech enhancement models with strong generalizability to real data.

  • Research Article
  • 10.1016/j.neucom.2025.130798
Cross-architecture knowledge distillation for speech enhancement: From CMGAN to Unet
  • Oct 1, 2025
  • Neurocomputing
  • Khanh Nguyen + 1 more

Cross-architecture knowledge distillation for speech enhancement: From CMGAN to Unet

  • Research Article
  • 10.1049/icp.2025.2881
CrossModal-DEAF: a cross-modal adaptive fusion network for robust speech enhancement
  • Oct 1, 2025
  • IET Conference Proceedings
  • Jinhui Wang + 2 more

CrossModal-DEAF: a cross-modal adaptive fusion network for robust speech enhancement

  • Research Article
  • 10.1016/j.inffus.2025.103218
Cross-attention among spectrum, waveform and SSL representations with bidirectional knowledge distillation for speech enhancement
  • Oct 1, 2025
  • Information Fusion
  • Hang Chen + 7 more

Cross-attention among spectrum, waveform and SSL representations with bidirectional knowledge distillation for speech enhancement

  • Research Article
  • 10.1080/17483107.2025.2559188
AI-driven neural implants for vision and hearing: a qualitative study of user perspectives
  • Sep 26, 2025
  • Disability and Rehabilitation: Assistive Technology
  • Odile C Van Stuijvenberg + 5 more

Neural implants are being developed to treat various conditions, including sensory impairments such as blindness and deafness. In these technologies there is a growing role for artificial intelligence (AI) to enable interpretation of complex data input. Current users of cochlear implants (CIs) face challenges in noisy environments, prompting the development of AI-driven software for personalized and context-aware noise suppression and speech enhancement. For blindness, an AI-driven cortical visual neural implant (cVNI) for artificial visual perception is under development. Here, AI-driven software may be used to process camera imaging for interfacing with the brain. If successful, these devices can offer important advantages for their users yet may also have ethical implications. Perspectives of (potential) users of these technologies is an important source for ethical analysis, yet so far these have not been explored in-depth. We performed a focus-group and interview study including potential users of a) the AI-driven cVNI (n = 5) and of b) the AI-driven CI (n = 3), and c) current or (former) users or a retinal implant (n = 3). Focus groups and interviews were transcribed and analyzed thematically. Perspectives were clustered under 1) expectations and experiences, including improvements from the status quo, enhancement of autonomy and design requirements, and 2) perceived risks and anticipated disadvantages, including uncertainty on effectiveness, operational risks, surgical risks, and media attention. AI-driven neural implants for vision and hearing were positively received by potential users due to their potential to improve autonomy. Yet, possible conditions for uptake were identified, including device aesthetics and sufficient levels of user-control.

  • Research Article
  • 10.1145/3749463
AccCall: Enhancing Real-time Phone Call Quality with Smartphone's Built-in Accelerometer
  • Sep 3, 2025
  • Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
  • Lei Wang + 7 more

Speech enhancement can greatly improve the user experience during phone calls in low signal-to-noise ratio (SNR) scenarios. In this paper, we propose a low-cost, energy-efficient, and environment-independent speech enhancement system, namely AccCall, that improves phone call quality using the smartphone's built-in accelerometer. However, a significant gap remains between the underlying insight and its practical applications, as several critical challenges should be addressed, including efficiency of speech enhancement in cross-user scenario, adaptive system triggering to reduce energy consumption, and lightweight deployment for real-time processing. To this end, we first design Acc-Aided Network (AccNet), a cross-modal deep learning model inherently capable of cross-user generalization through three key components, including cross-modal fusion module, accelerometer-aided (abbreviated as acc-aided) mask generator, the unified loss function. Second, we adopt a machine learning-based approach instead of deep learning to achieve high accuracy in distinguishing call activity states followed by adaptive system triggering, ensuring lower energy consumption and efficient deployment on mobile platforms. Finally, we propose a knowledge-distillation-driven structured pruning framework that optimizes model efficiency while preserving performance. Extensive experiments with 20 participants have been conducted under a user-independent scenario. The results show that AccCall achieves excellent and reliable adaptive triggering performance, and enables substantial real-time improvements in SISDR, SISNR, STOI, PESQ, and WER, demonstrating the superiority of our system in enhancing speech quality and intelligibility for phone calls.

  • Research Article
  • 10.3390/s25175484
A Robust Bilinear Framework for Real-Time Speech Separation and Dereverberation in Wearable Augmented Reality
  • Sep 3, 2025
  • Sensors (Basel, Switzerland)
  • Alon Nemirovsky + 2 more

This paper presents a bilinear framework for real-time speech source separation and dereverberation tailored to wearable augmented reality devices operating in dynamic acoustic environments. Using the Speech Enhancement for Augmented Reality (SPEAR) Challenge dataset, we perform extensive validation with real-world recordings and review key algorithmic parameters, including the forgetting factor and regularization. To enhance robustness against direction-of-arrival (DOA) estimation errors caused by head movements and localization uncertainty, we propose a region-of-interest (ROI) beamformer that replaces conventional point-source steering. Additionally, we introduce a multi-constraint beamforming design capable of simultaneously preserving multiple sources or suppressing known undesired sources. Experimental results demonstrate that ROI-based steering significantly improves robustness to localization errors while maintaining effective noise and reverberation suppression. However, this comes at the cost of increased high-frequency leakage from both desired and undesired sources. The multi-constraint formulation further enhances source separation with a modest trade-off in noise reduction. The proposed integration of ROI and LCMP within the low-complexity frameworks, validated comprehensively on the SPEAR dataset, offers a practical and efficient solution for real-time audio enhancement in wearable augmented reality systems.

  • Open Access Icon
  • Research Article
  • Cite Count Icon 2
  • 10.1016/j.neunet.2025.107562
Explicit estimation of magnitude and phase spectra in parallel for high-quality speech enhancement.
  • Sep 1, 2025
  • Neural networks : the official journal of the International Neural Network Society
  • Ye-Xin Lu + 2 more

Explicit estimation of magnitude and phase spectra in parallel for high-quality speech enhancement.

  • Research Article
  • 10.1016/j.compbiomed.2025.110940
Early stroke diagnosis and evaluation based on pathological voice classification using speech enhancement.
  • Sep 1, 2025
  • Computers in biology and medicine
  • Jun Zhang + 7 more

Early stroke diagnosis and evaluation based on pathological voice classification using speech enhancement.

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • .
  • .
  • .
  • 10
  • 1
  • 2
  • 3
  • 4
  • 5

Popular topics

  • Latest Artificial Intelligence papers
  • Latest Nursing papers
  • Latest Psychology Research papers
  • Latest Sociology Research papers
  • Latest Business Research papers
  • Latest Marketing Research papers
  • Latest Social Research papers
  • Latest Education Research papers
  • Latest Accounting Research papers
  • Latest Mental Health papers
  • Latest Economics papers
  • Latest Education Research papers
  • Latest Climate Change Research papers
  • Latest Mathematics Research papers

Most cited papers

  • Most cited Artificial Intelligence papers
  • Most cited Nursing papers
  • Most cited Psychology Research papers
  • Most cited Sociology Research papers
  • Most cited Business Research papers
  • Most cited Marketing Research papers
  • Most cited Social Research papers
  • Most cited Education Research papers
  • Most cited Accounting Research papers
  • Most cited Mental Health papers
  • Most cited Economics papers
  • Most cited Education Research papers
  • Most cited Climate Change Research papers
  • Most cited Mathematics Research papers

Latest papers from journals

  • Scientific Reports latest papers
  • PLOS ONE latest papers
  • Journal of Clinical Oncology latest papers
  • Nature Communications latest papers
  • BMC Geriatrics latest papers
  • Science of The Total Environment latest papers
  • Medical Physics latest papers
  • Cureus latest papers
  • Cancer Research latest papers
  • Chemosphere latest papers
  • International Journal of Advanced Research in Science latest papers
  • Communication and Technology latest papers

Latest papers from institutions

  • Latest research from French National Centre for Scientific Research
  • Latest research from Chinese Academy of Sciences
  • Latest research from Harvard University
  • Latest research from University of Toronto
  • Latest research from University of Michigan
  • Latest research from University College London
  • Latest research from Stanford University
  • Latest research from The University of Tokyo
  • Latest research from Johns Hopkins University
  • Latest research from University of Washington
  • Latest research from University of Oxford
  • Latest research from University of Cambridge

Popular Collections

  • Research on Reduced Inequalities
  • Research on No Poverty
  • Research on Gender Equality
  • Research on Peace Justice & Strong Institutions
  • Research on Affordable & Clean Energy
  • Research on Quality Education
  • Research on Clean Water & Sanitation
  • Research on COVID-19
  • Research on Monkeypox
  • Research on Medical Specialties
  • Research on Climate Justice
Discovery logo
FacebookTwitterLinkedinInstagram

Download the FREE App

  • Play store Link
  • App store Link
  • Scan QR code to download FREE App

    Scan to download FREE App

  • Google PlayApp Store
FacebookTwitterTwitterInstagram
  • Universities & Institutions
  • Publishers
  • R Discovery PrimeNew
  • Ask R Discovery
  • Blog
  • Accessibility
  • Topics
  • Journals
  • Open Access Papers
  • Year-wise Publications
  • Recently published papers
  • Pre prints
  • Questions
  • FAQs
  • Contact us
Lead the way for us

Your insights are needed to transform us into a better research content provider for researchers.

Share your feedback here.

FacebookTwitterLinkedinInstagram
Cactus Communications logo

Copyright 2025 Cactus Communications. All rights reserved.

Privacy PolicyCookies PolicyTerms of UseCareers