Abstract

We present an algorithm using convolutive non-negative matrix factorization (CNMF) to create noise-robust features for automatic speech recognition (ASR). Typically in noise-robust ASR, CNMF is used to remove noise from noisy speech prior to feature extraction. However, we find that denoising introduces distortion and artifacts, which can degrade ASR performance. Instead, we propose using the time-activation matrices from CNMF as acoustic model features. In this paper, we describe how to create speech and noise dictionaries that generate noise-robust time-activation matrices from noisy speech. Using the time-activation matrices created by our proposed algorithm, we achieve a 11.8% relative improvement in the word error rate on the Aurora 4 corpus compared to using log-mel filterbank energies. Furthermore, we attain a 13.8% relative improvement over log-mel filterbank energies when we combine them with our proposed features, indicating that our features contain complementary information to log-mel features.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call