Abstract
This paper proposes a novel mask estimation method for missing-feature reconstruction to improve speech recognition performance in time-varying background noise conditions. Conventional mask estimation methods based on noise estimates and spectral subtraction fail to reliably estimate the mask. The proposed mask estimation method utilizes a Posterior-based Representative Mean (PRM) vector for determining the reliability of the input speech spectrum, which is obtained as a weighted sum of the mean parameters of the speech model with posterior probabilities. To obtain the noise-corrupted speech model, a model combination method is employed, which was proposed in our previous study for a feature compensation method [1]. Experimental results demonstrate that the proposed mask estimation method is considerably more effective at increasing speech recognition performance in time-varying background noise conditions. By employing the proposed PRM-based mask estimation for missing-feature reconstruction, we obtain +36.29% and +30.45% average relative improvements in WER for speech babble and background music conditions respectively, compared to conventional mask estimation methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.