Abstract

In cocktail party listening with spatially separated speech sources, better ear listening is known to make a major contribution to speech intelligibility. The better ear is generally defined as the ear that receives the highest signal-to-noise ratio (SNR). Usually, this SNR is calculated based on the total length of the signal. However, this seems inappropriate when speech signals are involved since these are highly modulated both in the time and frequency domain. On a perceptual level, modulated maskers give rise to a higher target speech intelligibility than their unmodulated counterparts through the presence of glimpses. A simple measure to quantify the better ear advantage while taking these spectrotemporal fluctuations into account is introduced. In a headphone experiment, three simultaneous sequences of vowel-consonant-vowel utterances were presented at a fixed target-to-masker ratio. The stimuli were rendered with head-related transfer functions and contrasted against stimuli that did not contain any interaural level differences (ILDs) and, as a consequence, allowed no better ear listening. Using the proposed metric, we are able to explain differences in intelligibility for these speech-in-speech mixtures that would remain unexplained by the conventional SNR both for stimuli with and without ILDs.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call