Abstract

In natural environments, the performance of automatic speech recognition systems is often affected by environmental noise. The noise data augmentation method is commonly used to boost acoustic models’ robustness; however, audios with background noise may degrade the acoustic model's performance in clean audios. In this paper, we propose an approach of adversarial training with gated convolutional neural networks for robust speech recognition. We use generative adversarial networks and gated convolutional neural networks to allow the acoustic model to learn noise-invariant information. Specifically, we choose the first several layers of the acoustic model as the generator model. Systematic experiments on aishell-1 show that adversarial training with gated convolutional neural networks boosts the robustness of the acoustic model in noisy environments and improves the performance of the acoustic model in quiet environments. Compared with the simple noise data augmentation training method, adversarial training with gated convolutional neural networks reduces the average relative error rate by 4.4% on the clean test data and 5.6% on the noisy test data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call