Abstract

Predicting masked speech perception typically relies on estimates of the spectral distribution of cues supporting recognition. Current methods for estimating band importance for speech-in-noise use filtered stimuli. These methods are not appropriate for speech-in-speech because filtering can modify stimulus features affecting auditory stream segregation. Here, band importance is estimated by quantifying the relationship between speech recognition accuracy for full-spectrum speech and the target-to-masker ratio by channel at the output of an auditory filterbank. Preliminary results provide support for this approach and indicate that frequencies below 2 kHz may contribute more to speech recognition in two-talker speech than in speech-shaped noise.

Highlights

  • Understanding how masking and audibility affect speech perception is critically important given the ubiquity of competing sound streams in natural listening environments (Hodgson et al, 2007)

  • The amount of acoustic information supporting speech perception is often characterized as the product of the audibility of each frequency band and the importance of the speech cues contained in that band, with additional factors related to presentation level and distortion associated with hearing loss (e.g., ANSI, 1997)

  • The analyses described in this report estimated band importance functions (BIFs) for masked speech recognition and evaluated whether BIFs differ for speech-in-speech as compared to speech-in-noise

Read more

Summary

Introduction

Understanding how masking and audibility affect speech perception is critically important given the ubiquity of competing sound streams in natural listening environments (Hodgson et al, 2007). The amount of acoustic information supporting speech perception is often characterized as the product of the audibility of each frequency band and the importance of the speech cues contained in that band, with additional factors related to presentation level and distortion associated with hearing loss (e.g., ANSI, 1997) This general approach has been tremendously successful at predicting speech recognition in quiet and in steady noise, with applications in both research and clinical contexts (e.g., Amlani et al, 2002). Simple models based on audibility as a function of frequency typically fail when the background contains spectrotemporally complex sounds like competing speech This is problematic because many everyday listening environments contain multiple speech streams, and the ability to recognize speech under these conditions is important for successful communication (Phatak et al, 2018). Bandwidth requirements are greater for speech-in-speech recognition compared to speech-in-noise (Best et al, 2019), which could reflect flatter, more uniform band importance functions (BIFs) for speech-in-speech than speech-in-noise recognition

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call