Related Topics
Articles published on Speech Intelligibility
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
8693 Search results
Sort by Recency
- New
- Research Article
- 10.3390/acoustics8010016
- Mar 3, 2026
- Acoustics
- Teo Poldrugovac + 2 more
The Church of St. Francis in Pula, Croatia, is a well-preserved example of Franciscan gothic sacral architecture from the late 13th century. As preaching was highly valued by the Franciscan order as a way of communicating with the faithful, the study is focused on determining whether speech intelligibility in the church would have been adequate for successful communication between priests and their audience. The archaeoacoustic analysis of the church was performed in four stages: (1) in situ acoustic measurements in the present state, (2) development and calibration of the model of the present state based on measurement results, (3) development of the two models of the presumed historical state based on the calibrated model and historical data, and (4) prediction of acoustic conditions in the present and the historical states in terms of reverberation time T30 and of speech intelligibility in terms of speech transmission index STI. The factors considered in the study were (1) acoustics of the church, (2) profile of the audience (friars and the faithful), (3) layout of the audience areas (choir area in the front of the nave for the friars, back area of the nave for the faithful), (4) positions of the speech sources (altar for addressing the friars, pulpit for addressing the faithful), (5) occupancy (unoccupied and fully occupied church), (6) language used in liturgical ceremonies (Latin and native language), and (7) language proficiency of the audience (native speakers, users of a second language). The results show that (1) fair speech intelligibility (STI ≥ 0.45 for the faithful as native speakers, STI ≥ 0.50 for friars as non-native speakers of Latin) can be achieved for 50% of the audience in the choir area and for the entire audience in the back area in favourable conditions (fully occupied church, audience addressed from dedicated speaker positions), (2) the position of the pulpit (close to the audience and considerably elevated above it) is more favourable than the position of the altar (remote, barely elevated above the audience), and (3) in unoccupied conditions, fair speech intelligibility can still be achieved in at least 50% of the back audience area with the faithful gathered close to the pulpit, while it is not possible for the front audience area addressed from the altar. The summary conclusion is that the church of St. Francis in its presumed historical layout(s) would fulfil its primary function in a limited capacity. Fair speech intelligibility would likely have been sufficient for the audience to follow liturgical ceremonies conducted in the church, but not without difficulty.
- New
- Research Article
- 10.1007/s00106-026-01747-z
- Feb 27, 2026
- HNO
- A Bohnert + 4 more
The Mainz Audiometric Test for Children (MATCH) was originally developed for children aged 3-7years and validated in quiet conditions. The present study aimed to extend the MATCH to testing in noise, to establish normative data, and to validate the test under these conditions. Atotal of 103 children aged 3years and 3months to 7years and 7months participated, including 76with normal hearing and 27with ahearing impairment. Recruitment took place via local kindergartens (mainly normal-hearing children) and the Department of Phoniatry and Pediatric Audiology at the University Medical Center Mainz (mainly hearing-impaired children). To account for age-specific differences, children were divided into three groups: (1) < 4.5years (n = 28), (2)4.5-5.5years (n = 35), (3) > 5.5years (n = 40). Testing was performed monaurally using atouchscreen-based picture-pointing task, with adaptive determination of the speech reception threshold (SRT) in noise. The SRTs improved with age: -4.6 dB SNR (< 4.5years), -7.4 dB SNR (4.5-5.5years), and -9.3 dB SNR (> 5.5years) at 71.4% speech intelligibility. The slope of the psychometric functions increased with age, while variability and test duration decreased. Test-retest reliability was high (r = 0.84), and results correlated significantly with pure-tone audiometry (r = 0.75). The MATCH in noise is areliable and child-appropriate tool for assessing speech reception thresholds as early as preschool age. It addresses amethodological gap in existing procedures, which typically provide valid results only from school age onwards, and enables early and differentiated evaluation of hearing aid outcomes in children with hearing impairment.
- New
- Research Article
- 10.18623/rvd.v23.n4.5079
- Feb 27, 2026
- Veredas do Direito
- Amreen Raheem + 5 more
Background: Articulation disorders are common in children and refer to difficulties in producing speech sounds. These disorders often involve placement and manner errors that can impact speech intelligibility. Previous studies have explored these errors in various populations, but the prevalence and types of errors in children aged 3-12 years remain under examined, particularly in non-Western languages. Objective: The aim of this study was to assess the common placement and manner errors in children aged 3-12 years diagnosed with articulation disorders. Methods: A descriptive cross-sectional survey was conducted in government and private hospitals and rehabilitation centers. A total of 183 children were included through purposive sampling. Data was collected using an articulation screening tool, and statistical analysis was performed using SPSS version 25. Results: The most common placement error was alveolar (38.8%), followed by palatal (24.6%). Fricatives were the most frequent manner error (41.5%), and most errors occurred in the initial position of words (56.8%). Statistical analysis revealed significant associations between placement errors and age, and between manner errors and gender. Conclusion: The study identifies key articulation errors in children, with placement errors primarily involving alveolar and palatal sounds, and manner errors predominantly involving fricatives. These findings highlight the need for early intervention and targeted therapy to address these speech challenges.
- New
- Research Article
- 10.1159/000550626
- Feb 20, 2026
- Audiology & neuro-otology
- Anna Ratuszniak + 4 more
When a unilateral cochlear implant is implanted in a bilaterally deaf person they may still experience, along with significant benefits, certain limitations. The use of a CROS (contralateral routing of signal) system, which transmits the signal from the deaf side to the side with the speech processor, creates an opportunity to reduce these limitations and improve hearing performance in difficult acoustic conditions. In this study, a wireless CROS solution from Advanced Bionics specifically designed for cochlear implant (CI) speech processors is investigated. Speech-in-noise tests based on monosyllabic word tests were given to 15 CI users with CROS switched on and off. Three spatially different listening setups were used to probe three binaural effects (binaural redundancy, head shadow, and squelch), together with spatial release from masking. The mean age of users was 66.2 years (SD = 10.6), and all participants had ≥9 months of experience with their CI. The speech-in-noise tests revealed improved speech intelligibility in some test conditions when using the CROS device compared to listening with just a unilateral CI. No spatial release from masking was observed. The investigated CROS system is a valuable addition to unilateral CI systems in cases where bilateral implantation is not an option.
- New
- Research Article
- 10.1007/s00405-025-10000-2
- Feb 16, 2026
- European archives of oto-rhino-laryngology : official journal of the European Federation of Oto-Rhino-Laryngological Societies (EUFOS) : affiliated with the German Society for Oto-Rhino-Laryngology - Head and Neck Surgery
- Fahrettin Deniz Senli + 3 more
This study evaluated the relationship between auditory spectral resolution and listening effort using cochlear implant simulations in 21 normal-hearing participants. In a dual-task paradigm, participants repeated noise-vocoded sentences to 4, 6, 8, and 12 channels, as well as everyday normal speech, while performing a secondary rhyme-judgment task. Listening effort was measured via secondary-task reaction time and pupil dilation. Decreased spectral resolution increased both reaction time and pupil size, indicating greater effort. Reaction time increased significantly only in the most degraded conditions (4 and 6 channels), whereas pupil dilation increased across all degraded conditions compared to intact speech. Speech intelligibility, although affected by degradation, did not predict either of the effort measures. The weak correlation between reaction time and pupil size suggests they capture related but distinct aspects of listening effort. These findings highlight the multidimensional nature of listening effort, demonstrating that physiological measures can reveal increased cognitive load even when behavioral performance is unaffected. Combining these measures is crucial for a comprehensive assessment of the cognitive consequences of perceiving degraded speech.
- New
- Research Article
- 10.1097/mao.0000000000004867
- Feb 16, 2026
- Otology & neurotology : official publication of the American Otological Society, American Neurotology Society [and] European Academy of Otology and Neurotology
- Ceren Karaçayli + 2 more
The aim of this study is to assess high-frequency hearing performance in cochlear implantees and its impact on lyric comprehension in musical perception. In this study, 18 cochlear implant users were included. High-frequency hearing thresholds were measured with the implant device in on and off conditions. Patients were divided into 2 categories based on their ability to understand melody and lyrics or melody alone. Audiometric assessments included free-field audiometry, extended high-frequency audiometry, and Categories of Auditory Performance-II. Hearing thresholds at high frequencies improved significantly with the CI activated (P<0.001 across all frequencies) compared with the deactivated state. CI users who could understand both melody and lyrics showed better hearing thresholds at 9 kHz (P=0.007) and higher CAP-II scores (P=0.013) than CI users who could understand the melody but not the lyrics. A moderate positive correlation was observed between CAP-II scores and speech discrimination (ρ=0.629, P=0.003). Cochlear implants may extend auditory stimulation to frequencies beyond the typical range. Cochlear implants may extend auditory stimulation to a limited range of frequencies beyond the typical auditory spectrum. Improved high-frequency hearing, particularly at 9 kHz, is associated with better lyric comprehension in CI users, underscoring the importance of high-frequency hearing in musical perception and speech discrimination. Enhancing cochlear implants with components-such as microphones, speech processors, and transmitters-capable of capturing, analyzing, and delivering high-frequency acoustic signals may contribute to improved music perception and speech intelligibility, particularly in noisy environments.
- New
- Research Article
- 10.1007/s00405-026-10025-1
- Feb 16, 2026
- European archives of oto-rhino-laryngology : official journal of the European Federation of Oto-Rhino-Laryngological Societies (EUFOS) : affiliated with the German Society for Oto-Rhino-Laryngology - Head and Neck Surgery
- Majid Karimi + 5 more
Auditory Neuropathy Spectrum Disorder (ANSD) affects neural transmission of auditory signals and is classified into presynaptic and postsynaptic types. Cochlear implant (CI) outcomes depend on lesion site and stimulation rate, with slower rates potentially benefiting impaired conduction. Seventy children aged 2-7 years with genetically confirmed ANSD received the Nucleus CI system. Participants were grouped as either pre- or post-synaptic and randomly assigned to one of three stimulation rates: 500, 900, or 1800 pulses per second (PPS). Assessments at 3 and 6 months included aided thresholds, Persian versions of the Categories of Auditory Performance (CAP) and Speech Intelligibility Rating (SIR), cortical P1 latency and amplitude, and Electrically Evoked Compound Action Potential (ECAP) via Neural Response Telemetry across apical, mid, and basal electrodes. Despite comparable aided thresholds, presynaptic ANSD cases showed consistent ECAP responses and greater improvements in P1 measures and CAP/SIR scores, particularly at low and moderate rates. ECAP amplitude increased and latency decreased over time in this group, with no significant rate-time interaction. Postsynaptic cases showed limited ECAP responses and smaller functional gains. Lesion site critically influences CI outcomes in ANSD. Presynaptic children demonstrated physiological and functional improvements, especially at lower rates, consistent with preserved neural integrity. Postsynaptic cases showed restricted responses, likely reflecting demyelination. Findings support individualized CI programming guided by genetic and neural profiles.
- New
- Research Article
- 10.1080/14670100.2026.2626195
- Feb 14, 2026
- Cochlear Implants International
- Sarah Meehan + 6 more
Objective: To explore the potential of the internationally-renowned ‘Mini-Mental State Examination’ (MMSE) cognitive measure to explain variability in cochlear implant (CI) recipients’ speech recognition outcomes. The Stroop Color-Word Test (SCWT) was also employed as a measure of cognitive inhibition, an ability essential for focusing on target speech whilst ignoring background noise. The authors hypothesize that MMSE and SCWT scores correlate with CI users’ performance on established speech recognition tests. Methods: Cognitive screening was assessed by the MMSE and SCWT in adult CI users one year postoperatively. In addition, speech recognition was assessed using word and sentence lists, in both quiet and noise. Study sample: 28 participants, postlingually deafened adult CI users, median age 75 years. Results: Total MMSE scores correlated significantly with sentence recognition in noise (r = 0.621, P = .004), although no correlation was identified in quiet. Furthermore, the SCWT incongruent condition correlated significantly with CI users’ speech recognition in noise (r = −0.644, P = .007). Discussion: Global cognition (as assessed using the MMSE), and inhibition-concentration (as assessed using the SCWT), seem to be important factors in influencing CI recipients’ speech intelligibility in noise. This pilot study recommends a larger-scale study; given the global popularity of the MMSE and SCWT as quick cognitive screening tests, they may be useful in CI clinics when speech perception outcomes are unexpectedly poor for older adults and when questions of cognition arise. Conclusion: These standardized cognitive measures may prove helpful in counseling patients and families when coming to terms with CI outcomes and optimizing multidisciplinary rehabilitation strategies.
- New
- Research Article
- 10.65138/ijresm.v9i2.3413
- Feb 14, 2026
- International Journal of Research in Engineering, Science and Management
- Parth Gandhi + 2 more
In the field of audio processing, noise interference poses a significant challenge, affecting speech intelligibility and communication quality across multiple domains. Current audio denoising methods often struggle with the delicate balance be- tween noise removal and speech preservation. This paper presents WaveSplit, a novel multi-stage framework for audio enhancement and denoising that addresses these limitations by combining deep learning techniques with psychoacoustic principles and adaptive noise processing. Building upon the CleanUNet architecture, our approach introduces several innovative components: adaptive SNR-based processing, harmonic enhancement that preserves critical speech components, vocal clarity enhancement, and perceptual processing leveraging human hearing characteristics. Evaluations demonstrate that our framework achieves superior performance compared to baseline models, with significant improvements in SNR (76.36 dB compared to 7.20-8.10 dB in baseline models), PESQ scores (1.05 improvement versus 0.77- 0.91), and STOI metrics (0.15 versus 0.09-0.13) while reducing the “robotic” artifacts common in traditional methods. This research has significant implications for applications including telecommunications, hearing assistive technologies, content production, and speech recognition systems. By addressing both objective quality metrics and perceptual factors, WaveSplit represents an advancement toward more effective, natural-sounding audio enhancement solutions for real-world environments.
- New
- Research Article
- 10.1097/aud.0000000000001794
- Feb 13, 2026
- Ear and hearing
- Erin M Picou + 5 more
The purpose of this study was to evaluate the effects of an advanced digital noise reduction algorithm on measures in the laboratory (double-blind paired comparison testing, unblinded slider setting, sentence recognition performance) and during a field trial (unblinded slider setting). A secondary purpose was to evaluate participants' ability and willingness to use a smartphone application for controlling the algorithm in the field. Laboratory procedures included the evaluation of subjective ratings of listening ease using a double-blinded, paired comparisons approach while listening to speech in background noise. Participants were trained to use a smartphone application for manually controlling the advanced digital noise reduction algorithm and they indicated their preferred setting using the smartphone application, also in the laboratory. In addition, they completed double-blinded, behavioral sentence recognition in noise testing with a variety of advanced noise reduction settings. Finally, participants wore the hearing aids at home during a field trial, with instructions to use the smartphone application during the trial in noisy situations and to report on their experiences using a bespoke questionnaire upon their return to the laboratory. Double-blind, paired comparison testing revealed that most participants (80%) preferred to have advanced digital noise reduction active in the noisy, reverberant laboratory. These participants were also likely to demonstrate a preference for the advanced digital noise reduction to be active during the unblinded preference task. Advanced digital noise reduction did not affect sentence recognition in noise performance. During the field trial, participants could use the smartphone application to adjust the advanced noise reduction strength in noisy situations but did not choose to do so frequently. In addition, on average, participants did not activate the advanced digital noise reduction algorithm when in self-identified difficult listening situations during the field trial. The results of the current study demonstrate robust subjective benefits of advanced digital noise activation in the laboratory, with no effects on speech intelligibility. In addition, participants were internally consistent in the laboratory; their self-adjusted settings were consistent with the program they preferred during the double-blind, paired comparisons testing. However, the findings with the smartphone application demonstrate that, in general, they did not activate the advanced digital noise reduction during their self-identified difficult listening situations in the field. This result could partially be explained by the limited reported use of the smartphone application during the field trial. Future study is warranted to reconcile the laboratory and field trial findings in this study. In the interim, a reasonable clinical approach with limited negative speech intelligibility consequences might be to activate advanced digital noise reduction by default and provide smartphone application access in case a patient discovers a preference for an alternative noise reduction strength.
- New
- Research Article
- 10.1007/s10162-026-01031-5
- Feb 13, 2026
- Journal of the Association for Research in Otolaryngology : JARO
- Brian C J Moore
This paper evaluates the current performance of hearing aids, based on research findings and my experiences with hearing aids. The type of acoustic coupling to the ear is important. The fitting can be "closed" (sealing the ear canal), but this can lead to the occlusion effect; the user's own voice sounds too loud or too boomy. Alternatively, the fitting can be open (the eartip has a vent). This alleviates the occlusion effect, but it introduces comb-filtering (heard as perceptual coloration) and leads to little or no gain at low frequencies. Also, the highest frequency at which useful gain can be achieved is often about 5kHz, which is lower than optimal. While acoustic feedback cancellation systems have improved markedly, they can still introduce artifacts and impair sound quality, especially for music. Hearing aids use multi-channel amplitude compression to compensate for the reduced dynamic range of hearing-impaired people, but they often fail to restore the audibility of soft sounds, especially at high frequencies, and the amount of compression is often limited (and less than indicated by the manufacturers' fitting software), leading to loudness discomfort (and sometimes reduced speech intelligibility) at high sound levels. Also, compression systems introduce cross-modulation, impairing sound quality. Most hearing aids incorporate directional processing and noise-reduction systems intended to improve the ability to understand speech in noisy situations. These systems can be effective with a closed fitting, but much of the benefit is lost with an open fitting because of leakage of background sounds through the vent.
- New
- Research Article
- 10.1044/2025_ajslp-25-00034
- Feb 10, 2026
- American journal of speech-language pathology
- Daniel Kim + 5 more
This study sought to determine the variable effect of three cueing strategies (i.e., loud, clear, or slow speech) on speech intelligibility and perceived speech severity in talkers with dysarthria due to Parkinson's disease (PD). The study also aimed to identify perceptual speech features associated with responses to these speech cues. Eighty-four naive listeners rated speech samples of 52 talkers with PD. Each talker's samples consisted of two sentences produced in habitual, loud, clear, and slow speech. Listeners rated intelligibility and speech severity as perceptual outcome measures using separate visual analog scales. The relative change in intelligibility and speech severity ratings from habitual speech to each speech cue was calculated. Based on the threshold of a meaningful change, talkers were grouped into "positive responders" and "nonpositive responders" for each outcome measure and each speech cue. Finally, a profile of perceptual speech features and severity ratings of the articulatory and phonatory subsystem impairment was established for each talker based on their habitual speech to identify potential predictors of the cueing response patterns. For both outcome measures, intelligibility and speech severity, loud speech elicited the most positive responses followed by clear speech and slow speech. Subsystem impairment severity differed between positive responders and non-positive responders. For each cueing strategy, the presence of specific perceptual speech features indicated a likelihood of a positive response. Findings provide an important stepping stone for future research that seeks to advance personalized treatment of dysarthria in talkers with PD.
- Research Article
- 10.3390/a19020134
- Feb 7, 2026
- Algorithms
- Nisreen Talib Abdulhusein + 1 more
Speech enhancement aims to improve speech quality and intelligibility in noisy environments and is important in applications such as hearing aids, mobile communications and automatic speech recognition (ASR). This paper shows a structured review of speech enhancement techniques, classified depending on the channel configuration and signal processing framework. Both traditional and modern approaches are discussed, including classical signal processing methods, machine learning techniques, and recent deep learning-based models. Furthermore, common noise types, widely used speech datasets, and standard evaluation metrics for evaluating speech quality and intelligibility are reviewed. Key challenges such as non-stationary noise, data limitations, reverberation, and generalization to unseen noise conditions are highlighted. This review presents the advancements in speech enhancement and discusses the challenges and trends of this field. Valuable insights are provided for researchers, engineers, and practitioners in the area. The findings aid in the selection of suitable techniques for improved speech quality and intelligibility, and we concluded that the trend in speech enhancement has shifted from standard algorithms to deep learning methods that can efficiently learn information regarding speech signals.
- Research Article
- 10.1097/pts.0000000000001473
- Feb 5, 2026
- Journal of patient safety
- Andrew Michael Armson + 2 more
The increased use of respiratory protective equipment (RPE) during and since the COVID-19 pandemic has highlighted its adverse impact on communication, particularly in high-stakes environments such as anesthesia and surgery, where clear verbal exchange is essential. This study examines how different types and combinations of RPE-including surgical masks, FFP-3 masks, and powered air-purifying respirators (PAPRs)-affect speech intelligibility in the anesthetic setting. Twenty-one NHS theater staff participated in speech intelligibility testing conducted in a standard anesthetic room. Performance was assessed using single-word, consonant-nucleus-consonant (CNC) tests under various RPE conditions, including combinations of masks and PAPRs. Significant reductions in word recognition accuracy were observed when speakers wore RPE, with FFP-3 masks producing a more pronounced reduction than surgical masks. Communication was further impaired when listeners used PAPRs, particularly when speakers simultaneously wore FFP-3 masks. In contrast, intelligibility was not significantly affected when speakers used PAPRs alone. RPE, principally devices that obscure the mouth, substantially impairs verbal communication in the anesthetic environment, with FFP-3 masks causing the greatest reduction in speech clarity. PAPRs introduce additional barriers, especially for listeners. To mitigate these effects, health care professionals should select RPE that balances protection with communication needs. Additional strategies, such as reducing background noise, enhancing RPE design, and implementing alternative communication methods may further improve verbal exchanges in critical care settings.
- Research Article
- 10.4274/jarem.galenos.2025.60252
- Feb 5, 2026
- Journal of Academic Research in Medicine
- Atılım Atılgan + 3 more
Feasibility of the Speech Intelligibility Index in Turkish‑speaking Adults with Sensorineural Hearing Loss
- Research Article
- 10.21053/ceo.2025-00227
- Feb 4, 2026
- Clinical and experimental otorhinolaryngology
- Masafumi Ueno + 7 more
A markedly decreased speech intelligibility is reported to be a major feature of vestibular schwannoma (VS); however, details such as the relationship between speech intelligibility and hearing levels have not yet been adequately clarified. Among 473 patients with sporadic unilateral vestibular schwannoma, scatter plots of pure tone audiometry (PTA) thresholds and speech discrimination scores were created. Simple regression analysis was conducted, and the Pearson correlation coefficient was calculated. The results were compared with those of 173 patients with asymmetric cochlear hearing loss, including the percentage of patients with unserviceable speech intelligibility. In patients with vestibular schwannoma, a strong correlation was found between the speech discrimination score and PTA threshold on the affected side, regardless of Koos grade or audiogram shape. In most patients, there was no significant difference in distribution compared with patients with asymmetric cochlear hearing loss. Although the proportion of patients with unserviceable speech discrimination was higher for vestibular schwannoma than asymmetric cochlear hearing loss across all hearing levels, the condition was limited to a small number of cases; a significant difference was only observed in cases with severe hearing loss. Considering its relationship with hearing level, a markedly decreased speech discrimination score is no longer a major feature of vestibular schwannoma.
- Research Article
- 10.64898/2026.02.02.703242
- Feb 4, 2026
- bioRxiv
- Siddhant Tripathy + 7 more
Hidden hearing loss (HHL) is an auditory neuropathy characterized by altered auditory nerve responses despite normal hearing thresholds. Recent experimental and computational studies suggest that permanent disruptions to heminode positions in spiral ganglion neuron (SGN) fibers can contribute to these deficits. However, the interaction between heminode disruption and noisy backgrounds ubiquitous in daily listening remains unexplored. This study investigates how background noise affects auditory processing with these peripheral disorders and how deficits propagate to downstream sound localization circuits in the superior olivary complex. We developed computational models of SGN fibers with mild and severe degrees of heminode disruption, subjected to sinusoidal tone stimuli in the presence of background noise with varying spectral characteristics. We analyzed the phase-locking of SGN fiber responses to the stimulus tone and modeled the subsequent effects on interaural time difference (ITD) sensitivity in the medial superior olive (MSO) using a binaural localization network. We found that near-tone-frequency noise disrupted SGN phase locking through cycle-to-cycle variability in spike phases, with effects consistent across tone frequencies. Mild heminode disruption produced frequency-dependent degradation in SGN phase locking, with effects observed only at higher frequencies tested (600–1000 Hz), without reducing overall firing rates. Critically, the effects of noise and heminode disruption were additive, with combined exposure leading to reduced ITD sensitivity and large temporal fluctuations in MSO responses. Severe heminode disruption, which additionally reduced firing rates at the SGN fibers and subsequent stages, produced profound localization deficits across all frequencies tested. Thus, our model results suggest that noisy environments exacerbate auditory deficits from peripheral disorders implicated in HHL and could potentially impair speech intelligibility through degradation in localization ability. This model may be useful for understanding the downstream impacts of SGN neuropathies.
- Research Article
- 10.36948/ijfmr.2026.v08i01.67991
- Feb 4, 2026
- International Journal For Multidisciplinary Research
- Raghu M + 1 more
Whisper-Aware Spectro-Transformer U-Net (WAST-U-Net), a multilingual, emotion-preserving speech enhancement model optimized for automatic speech recognition (ASR). Extending the U-Former backbone, our architecture integrates Transformer blocks at skip connections, emotion and language embeddings at the bottleneck, and a novel Whisper-WER loss that directly optimizes ASR intelligibility. Unlike traditional models that prioritize noise suppression at the cost of expressiveness, WAST-U-Net enhances speech while preserving speaker emotion and linguistic identity. Evaluated on VoiceBank-DEMAND and a Kannada-English code-mixed dataset, our model achieves state-of-the-art performance across PESQ, STOI, SI-SNR, Whisper-WER, and emotion accuracy. Ablation studies confirm the synergistic contribution of each component. This framework sets a new benchmark for multilingual, emotionally intelligent speech enhancement, paving the way for accessible ASR in noisy, real-world environments.
- Research Article
- 10.1016/j.heares.2026.109539
- Feb 1, 2026
- Hearing research
- Iris Van De Ryck + 5 more
EEG-based decoding of auditory attention to conversations with turn-taking speakers.
- Research Article
- 10.1016/j.jvoice.2026.01.049
- Feb 1, 2026
- Journal of voice : official journal of the Voice Foundation
- Michael P Cannito + 4 more
Harmonic Amplitude Differences Before and After Voice Treatment for Parkinson's Disease and Their Relationship to Voice Quality and Speech Intelligibility.