Study on intelligibility improvement method based on subband waveform processing focusing on dynamic feature of speech

Hiroki Kohara,Kensaku Asahi,Hideki Banno

doi:10.1121/1.4969514

Abstract

This paper describes intelligibility improvement method for speech signal based on subband waveform processing. Our approach is based on the observation that clear speech has higher delta-cepstrum value in transient parts between phonemes, and emphasizes delta-cepstrum of input speech by a filter in the cepstral domain which amplifies a particular modulation frequency. However, since this approach generates synthetic sound by using an analysis/synthesis system, quality of the generated sound is sometimes degraded. To prevent this degradation, a subband waveform-based method is introduced. This method divides an input signal into several subband signals by a quadrature mirror filter (QMF) which approximately enables perfect reconstruction of input signal from the subband signals, converts an amplification gain sequence in the cepstral domain into that in the subband-waveform domain, and then multiplies the converted amplification gain sequence to the subband signal on a sample-by-sample basis. Synthetic sounds were generated by the method in the cases where the number of subbands is set to two, four, and eight. We found that the sound that the number of subbands is two includes artificial power fluctuation, and increasing the number of subbands decreases the artificial power fluctuation and makes quality of the generated sound better.

Full Text