A speech pre-processing method to reduce overlap masking in reverberant environments

Julian Grosse,Steven Van De Par

doi:10.1121/1.4969339

Abstract

In daily life, we are often exposed to speech that is rendered over loudspeakers in a reverberant acoustical environment (examples are public-address systems used in a train station or conference halls). Whereas the early reflections can support speech intelligibility, the late reflections smear the speech in time which will result in an overlap of consecutive speech portions and an effective low-pass filtering of the speech-specific modulation spectrum. This study proposes a perceptually motivated pre-processing approach, based on previous work of Hodoshima et al. [J. Acoust. Soc. 119, 4055-4064 (2006)], which reduces the detrimental effects of reverberation by suppressing steady-state portions and emphasizing potentially inaudible/masked segments of speech before it is emitted in the reverberant environment. For pre-processing, the impulse response is separated into a direct and a reverberant path to decide whether speech segments are inaudible and can be neglected or should be emphasized. A speech intelligibility prediction model is used to select the optimal pre-processing parameters for each specific acoustical environment. Listening tests showed, that this pre-processing approach is able to partially compensate the detrimental effects of reverberation leading to a reduction in speech reception thresholds of about 2 to 5 dB measured in speech shaped noise.

Full Text