Abstract
19(7), 2125-2136]. This "hybrid" model, named sEPSMcorr, is shown to account for the effects of stationary and fluctuating additive interferers as well as for the effects of non-linear distortions, such as spectral subtraction, phase jitter, and ideal time frequency segregation (ITFS). The model shows a broader predictive range than both the original mr-sEPSM (which fails in the phase-jitter and ITFS conditions) and STOI (which fails to predict the influence of fluctuating interferers), albeit with lower accuracy than the source models in some individual conditions. Similar to other models that employ a short-term correlation-based back end, including STOI, the proposed model fails to account for the effects of room reverberation on speech intelligibility. Overall, the model might be valuable for evaluating the effects of a large range of interferers and distortions on speech intelligibility, including consequences of hearing impairment and hearing-instrument signal processing.
Highlights
Speech is the main tool used by humans to communicate with one another, making it a key factor in most social interactions
The simulations obtained with the proposed model, sEPSMcorr, are indicated by the filled black circles and the simulations obtained with mr-speech-based envelope power spectrum model (sEPSM) and short-time objective intelligibility (STOI) are represented by the gray squares and the dark gray diamonds, respectively
The model operates on the clean unprocessed speech and the noisy mixture and combines the front end of the mrsEPSM model (Jørgensen et al, 2013) with a correlationbased back end similar to the one employed in the STOI measure (Taal et al, 2011)
Summary
Speech is the main tool used by humans to communicate with one another, making it a key factor in most social interactions. The way in which humans process and decode speech signals has been a focus of research for decades and various speech perception models have been presented that attempt to quantify the effects of the acoustic properties of the target speech and the interferers, the effects of the environment (e.g., a room) or transmission channel (e.g., a communication device or a hearing instrument), as well as effects of auditory processing (e.g., a hearing loss) on speech intelligibility. The predictions of the AI and SII are based on a weighted average of the long-term signal-to-noise-ratio (SNR) in different frequency bands, using the clean speech signal and the background noise as inputs This long-term analysis implies that the models are insensitive to short-term effects, e.g., the ability of human listeners to utilize speech information in the dips of temporally fluctuating maskers, such as interfering speech, often referred to as “listening-in the-dips” (Festen and Plomp, 1990). Since the ESII assumes that the clean speech and the noise can be accessed separately, it cannot account for conditions where the speech and noise mixture have been subjected to non-linear processing, such as noise reduction algorithms or amplitude compression schemes (Rhebergen et al, 2009)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.