Relative intelligibility of dynamically extracted transient versus steady-state components of speech

J R Boston,C C Li,J D Durrant,Kristie Kovacyk,Sungyub Yoo,Amro El‐Jaroudi,Stacey Karn

doi:10.1121/1.4784586

Abstract

Consonants are recognized to dominate higher frequencies of the speech spectrum and to carry more information than vowels, but both demonstrate quasi-steady state and transient components, such as vowel to consonant transitions. Fixed filters somewhat separate these effects, but probably not optimally, given diverse words, speakers, and situations. To enhance the transient characteristics of speech, this study used time-varying adaptive filters [Rao and Kumaresan, IEEE Trans. Speech Audio Process. 8, 240–254 (2000)], following high-pass filtering at 700 Hz (well-known to have minimal effect on intelligibility), to extract predominantly steady-state components of speech material (CVC words, NU-6). The transient component was the difference between the sum of the filter outputs and the original signal. Psychometric functions were determined in five subjects with and without background noise and fitted by ogives. The transient components averaged filtered speech energy, but PBmax was not significantly different (nonparametric ANOVA) from that of either the original or highpass filtered speech. The steady-state components yielded significantly lower PBmax (p 3D 0.003) despite their much greater energy, as expected. These results suggest a potential approach to dynamic enhancement of speech intelligibility. [Work supported by ONR.]

Full Text