Abstract

The primary motive of this study is to develop an automatic speech recognition (ASR) system using limited amount of speech data such that it is least affected by speaker-dependent acoustic variations. The two factors contributing towards inter-speaker variability that are focused upon in this work are pitch and speaking-rate variations. In order to simulate such a limited data scenario, an ASR system is trained on adults' speech and tested using speech data from adult as well as child speakers. Compared to adults' speech test case, the recognition rates are noted to be extremely degraded when the test speech is from child speakers. The observed degradation is due to large differences in pitch and speaking-rate between adults' and children's speech along with other factors leading to inter-speaker acoustic variations. To overcome the mismatch in pitch and speaking-rate, two different approaches are proposed in this paper. In the first approach, the pitch and speaking-rate of children's speech test set are explicitly modified using a recently proposed prosody modification technique that exploits fuzzy classification of spectral bins. In the second approach, pitch and speaking-rate of the training data are modified to create newer versions of the data. In order to capture greater acoustic variability, the original and the modified versions are then pooled together. The ASR system trained on augmented data is noted to be more robust towards pitch and speaking-rate variations. Consequently, relative improvements of 17% and 31% over the baseline are obtained on decoding adults' and children's speech test sets, respectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.