Abstract

This work investigates the use of subglottal resonances (SGRs) for speaker normalization in noisy environments. Based on our previous work, a noise-robust algorithm is developed for estimating the first three SGRs from speech signals; it achieves robustness by factoring the short-term (or local) signal-to-noise ratio into the estimation process. The SGR estimates provided by this algorithm are refined by applying maximum-likelihood (ML) corrections, and are used in a non-linear frequency-warping technique that we recently developed. This SGR-based normalization (SN) scheme is evaluated on the AURORA-4 database in clean and noisy conditions. Using power-normalized cepstral coefficients (PNCCs) as front-end features, SN reduces the average word error rate by 8.7% relative to ML-based vocal-tract length normalization (VTLN). A fast version of SN (without ML corrections of SGR estimates) is also found to outperform VTLN (by 5.9% relative); it is computationally less complex than VTLN and hence a potential alternative for real-time applications.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call