Abstract
In this paper, we study the performance of a single Gauss model (SGM) set-based missing data imputation method in a large vocabulary speaker independent continuous speech recognition task. We perform a series of ASR (Automatic Speech Recognition) experiments of large vocabulary speaker independent continuous Chinese speech distorted by 2 typical additive noises. Our experiments show that the SGM set-based missing data imputation method can greatly improve ASR system robustness against additive noise. If we use an ideal mask estimation method for stationary Gaussian white noise distorted speech (SNR=15dB), word correctness is improved from 32.31% to 57.84% and word accuracy from 12.09% to 51.02%. For non-stationary babble noise distorted speech (SNR=15dB), word correctness is improved from 48.62% to 68.20% and word accuracy from 29.74% to 62.36%. However if we use spectrum subtraction-based mask estimation, mask estimation error, which is concrete and irreversible, can cause a recognition disaster for a large vocabulary speaker independent continuous speech system with a complex acoustic model.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.