A study on data augmentation of reverberant speech for robust speech recognition

Tom Ko,Michael L Seltzer,Daniel Povey,Sanjeev Khudanpur,Vijayaditya Peddinti

doi:10.1109/icassp.2017.7953152

Abstract

The environmental robustness of DNN-based acoustic models can be significantly improved by using multi-condition training data. However, as data collection is a costly proposition, simulation of the desired conditions is a frequently adopted strategy. In this paper we detail a data augmentation approach for far-field ASR. We examine the impact of using simulated room impulse responses (RIRs), as real RIRs can be difficult to acquire, and also the effect of adding point-source noises. We find that the performance gap between using simulated and real RIRs can be eliminated when point-source noises are added. Further we show that the trained acoustic models not only perform well in the distant-talking scenario but also provide better results in the close-talking scenario. We evaluate our approach on several LVCSR tasks which can adequately represent both scenarios.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A study on data augmentation of reverberant speech for robust speech recognition

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Spatial Data Augmentation with Simulated Room Impulse Responses for Sound Event Localization and Detection
Yuichiro Koyama ... Naoya Takahashi
-
Yuichiro Koyama, et. al.Yuichiro Koyama ... Naoya Takahashi
23 May 2022
23 May 2022

Study on method for protecting speech privacy by actively controlling speech transmission index in simulated room
Masashi Unoki ... Masato Akagi
-
Masashi Unoki, et. al.Masashi Unoki ... Masato Akagi
01 Dec 2017
01 Dec 2017

Supervised single-channel speech dereverberation and denoising using a two-stage model based sparse representation
Long Zhang ... Zhongfu Ye
Speech Communication | VOL. 97
Long Zhang, et. al.Long Zhang ... Zhongfu Ye
27 Dec 2017
Speech Communication | VOL. 97

Low Complexity NLMS for Multiple Loudspeaker Acoustic ECHO Canceller Using Relative Loudspeaker Transfer Functions
Ofer Schwartz ... Emanuel.A.P Habets
-
Ofer Schwartz, et. al.Ofer Schwartz ... Emanuel.A.P Habets
11 Apr 2020
11 Apr 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A study on data augmentation of reverberant speech for robust speech recognition

Abstract

Talk to us

Similar Papers