Abstract
Despite the established importance of temporal fine-structure (TFS) on speech perception in noise, existing speech transmission metrics use primarily envelope information to model speech intelligibility variance. This study proposes a new physical metric for predicting speech intelligibility using information obtained from the Hilbert-derived TFS waveform. It is found that by making explicit use of coherence information contained in the complex spectra of the Hilbert-derived TFS waveforms of the clean and corrupted speech signals, and assessing the extent to which the coherence in the Hilbert fine structure is affected following the linear or non-linear processing (e.g., noise distortion, speech enhancement, etc.) of the stimulus, the predictive power of the intelligibility measure can be significantly improved for noise-distorted and noise-suppressed speech signals. When evaluated with speech recognition scores obtained with normal-hearing listeners, including a total of sixty-four noise-suppressed conditions with nonlinear distortions and eight noisy conditions without subsequent noise reduction, the proposed TFS-based measure was found to predict speech intelligibility better than most envelope- and coherence-based measures. High correlation was maintained for all types of maskers tested, with a maximum correlation of r=0.95 achieved in car and street noise conditions.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have