Abstract
The analysis of real-world conversational signal-to-noise ratios (SNRs) can provide insight into people's communicative strategies and difficulties and guide the development of hearing devices. However, measuring SNRs accurately is challenging in everyday recording conditions in which only a mixture of sound sources can be captured. This study introduces a method for accurate in situ SNR estimation where the speech signal of a target talker in natural conversation is captured by a cheek-mounted microphone, adjusted for free-field conditions and convolved with a measured impulse response to estimate its power at the receiving talker. A microphone near the receiver provides the noise-only component through voice activity detection. The method is applied to in situ recordings of conversations in two real-world sound scenarios. It is shown that the broadband speech level and SNR distributions are estimated more accurately by the proposed method compared to a typical single-channel method, especially in challenging, low-SNR environments. The application of the proposed two-channel method may render more realistic estimates of conversational SNRs and provide valuable input to hearing instrument processing strategies whose operating points are determined by accurate SNR estimates.
Highlights
Speech communication is a complex phenomenon that combines auditory, visual, and cognitive processes to enable people to transmit and receive information
This study introduces a method for accurate in situ signal-to-noise ratio (SNR) estimation where the speech signal of a target talker in natural conversation is captured by a cheek-mounted microphone, adjusted for free-field conditions and convolved with a measured impulse response to estimate its power at the receiving talker
It is shown that the broadband speech level and SNR distributions are estimated more accurately by the proposed method compared to a typical single-channel method, especially in challenging, low-SNR environments
Summary
Speech communication is a complex phenomenon that combines auditory, visual, and cognitive processes to enable people to transmit and receive information. Such a conversation often occurs in noisy backgrounds in which a speech source of interest, i.e., the target talker signal, is accompanied by interfering sources (e.g., noise or competing talkers) and reverberation. In the study by Smeds et al (2015), HA recordings (Wagener et al, 2008) obtained by HA users in various situations of their daily lives were analyzed.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.