Abstract

Automatic speech recognition (ASR) systems frequently work in a noisy environment. As they are often trained on clean speech data, noise reduction or adaptation techniques are applied to decrease the influence of background disturbance even in the case of unknown conditions. Speech data mixed with noise recordings from particular environment are often used for the purposes of model adaptation. This paper analyses the improvement of recognition performance within such adaptation when multi-condition training data from a real environment is used for training initial models. Although the quality of such models can decrease with the presence of noise in the training material, they are assumed to include initial information about noise and consequently support the adaptation procedure. Experimental results show significant improvement of the proposed training method in a robust ASR task under unknown noisy conditions. The decrease by 29 % and 14 % in word error rate in comparison with clean speech training data was achieved for the non-adapted and adapted system, respectively.

Highlights

  • Automatic Speech Recognition (ASR) in a noisy environment has been a challenging issue in recent decades for many research centers, as the presence of noise significantly decreases the accuracy of ASR systems

  • The paper shows the advantages of using a multi-condition training data for robust ASR in unknown background conditions

  • The main contribution of the work is in using recordings from a real environment, which reflects the real influence of noise in a robust recognition task

Read more

Summary

Introduction

Automatic Speech Recognition (ASR) in a noisy environment has been a challenging issue in recent decades for many research centers, as the presence of noise significantly decreases the accuracy of ASR systems. Several adaptation techniques use background noise, which is combined with the speech signal e.g. in multi-environment models [4], or with acoustic models in parallel model combination (PMC) [5]. Other techniques use noisy speech data to adapt acoustic models for particular background conditions by retraining the clean speech models or by some transformation using maximum likelihood linear regression (MLLR) [6] or Maximum A Posteriori (MAP) adaptation [7]. The latter two schemes are used for speaker adaptation with only a small proportion of adaptation material

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.