Creating speaker independent ASR system through prosody modification based data augmentation

S Shahnawazuddin,Nagaraj Adiga,Hemant Kumar Kathania,B Tarun Sai

doi:10.1016/j.patrec.2019.12.019

Abstract

In this paper, the effect of prosody-modification-based data augmentation is explored in the context of automatic speech recognition (ASR). The primary motive is to develop ASR systems that are less affected by speaker-dependent acoustic variations. Two factors contributing towards inter-speaker variability that are focused on in this paper are pitch and speaking-rate variations. In order to simulate such an ASR task, we have trained an ASR system on adults’ speech and tested it using speech data from adult as well as child speakers. Compared to adults’ speech test case, the recognition rates are noted to be extremely degraded when the test speech is from child speakers. The observed degradation is basically due to large differences in pitch and speaking-rate between adults’ and children’s speech. To overcome this problem, pitch and speaking-rate of the training speech are modified to create new versions of the data. The original and the modified versions are then pooled together in order to capture greater acoustic variability. The ASR system trained on augmented data is noted to be more robust towards speaker-dependent variations. Relative improvements of 11.5% and 27.0% over the baseline are obtained on decoding adults’ and children’s speech test sets, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Creating speaker independent ASR system through prosody modification based data augmentation

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition Letters

Lead the way for us

Journal: Pattern Recognition Letters	Publication Date: Dec 30, 2019
Citations: 40

Similar Papers

Developing speaker independent ASR system using limited data through prosody modification based on fuzzy classification of spectral bins
S Shahnawazuddin ... Hemant K Kathania
Digital Signal Processing | VOL. 93
S Shahnawazuddin, et. al.S Shahnawazuddin ... Hemant K Kathania
11 Jul 2019
Digital Signal Processing | VOL. 93

Explicit Pitch Mapping for Improved Children’s Speech Recognition
Hemant Kumar Kathania ... S Shahnawazuddin
Circuits, Systems, and Signal Processing | VOL. 37
Hemant Kumar Kathania, et. al.Hemant Kumar Kathania ... S Shahnawazuddin
11 Sep 2017
Circuits, Systems, and Signal Processing | VOL. 37

Developing children's ASR system under low-resource conditions using end-to-end architecture
Ankita ... S Shahnawazuddin
Digital Signal Processing | VOL. 146
Ankita, et. al. Ankita ... S Shahnawazuddin
08 Jan 2024
Digital Signal Processing | VOL. 146

Developing children’s speech recognition system for low resource Punjabi language
Virender Kadyan ... Amitoj Singh
Applied Acoustics | VOL. 178
Virender Kadyan, et. al.Virender Kadyan ... Amitoj Singh
22 Mar 2021
Applied Acoustics | VOL. 178

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Creating speaker independent ASR system through prosody modification based data augmentation

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition Letters