TS-RIR: Translated Synthetic Room Impulse Responses for Speech Augmentation

Anton Ratnarajah,Dinesh Manocha,Zhenyu Tang

doi:10.1109/asru51503.2021.9688304

Abstract

We present a method for improving the quality of synthetic room impulse responses for far-field speech recognition. We bridge the gap between the fidelity of synthetic room impulse responses (RIRs) and the real room impulse responses using our novel, TS-RIRGAN architecture. Given a synthetic RIR in the form of raw audio, we use TS-RIRGAN to translate it into a real RIR. We also perform real-world sub-band room equalization on the translated synthetic RIR. Our overall approach improves the quality of synthetic RIRs by compensating low-frequency wave effects, similar to those in real RIRs. We evaluate the performance of improved synthetic RIRs on a far-field speech dataset augmented by convolving the LibriSpeech clean speech dataset [1] with RIRs and adding back-ground noise. We show that far-field speech augmented using our improved synthetic RIRs reduces the word error rate by up to 19.9% in Kaldi far-field automatic speech recognition benchmark [2].

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

TS-RIR: Translated Synthetic Room Impulse Responses for Speech Augmentation

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

IR-GAN: Room Impulse Response Generator for Far-Field Speech Recognition
Anton Ratnarajah ... Dinesh Manocha
-
Anton Ratnarajah, et. al.Anton Ratnarajah ... Dinesh Manocha
30 Aug 2021
30 Aug 2021

An MTF-based blind restoration of temporal power envelopes as a front-end processor for automatic speech recognition systems in reverberant environments
Xugang Lu ... Masashi Unoki
The Journal of the Acoustical Society of America | VOL. 123
Xugang Lu, et. al.Xugang Lu ... Masashi Unoki
01 May 2008
The Journal of the Acoustical Society of America | VOL. 123

Front-end for far-field speech recognition based on frequency domain linear prediction
Sriram Ganapathy ... Samuel Thomas
-
Sriram Ganapathy, et. al.Sriram Ganapathy ... Samuel Thomas
22 Sep 2008
22 Sep 2008

A study on data augmentation of reverberant speech for robust speech recognition
Tom Ko ... Vijayaditya Peddinti
-
Tom Ko, et. al.Tom Ko ... Vijayaditya Peddinti
01 Mar 2017
01 Mar 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

TS-RIR: Translated Synthetic Room Impulse Responses for Speech Augmentation

Abstract

Talk to us

Similar Papers