Abstract

The application of Missing Data Techniques (MDT) to increase the noise robustness of HMM/GMM-based large vocabulary speech recognizers is hampered by a large computational burden. The likelihood evaluations imply solving many constrained least squares (CLSQ) optimization problems. As an alternative, researchers have proposed frontend MDT or have made oversimplifying independence assumptions for the backend acoustic model. In this article, we propose a fast Multi-Candidate (MC) approach that solves the per-Gaussian CLSQ problems approximately by selecting the best from a small set of candidate solutions, which are generated as the MDT solutions on a reduced set of cluster Gaussians. Experiments show that the MC MDT runs equally fast as the uncompensated recognizer while achieving the accuracy of the full backend optimization approach. The experiments also show that exploiting the more accurate acoustic model of the backend does pay off in terms of accuracy when compared to frontend MDT.

Highlights

  • One of the major concerns in deploying Automatic Speech Recognition (ASR) applications is the lack of robustness of the technology when compared to human listeners

  • AIn the case of No Missing Data Techniques (MDT), the time for cluster Gaussians (CG) evaluation is replaced by Fast Removal of Gaussians (FRoG). bIn cluster-based reconstruction with Maximum A Posterior probability (MAP), the time of CG evaluation include the calculation of likelihood for 500 PROSPECT CGs, which is equivalent to evaluation of 1000 Mutual Information-based Discriminant Analysis (MIDA) Gaussians

  • Spectral CGs are not able to provide channel estimates as PROSPECT CGs can. This claim is motivated by an experiment where the log-spectral CGs provide the candidates of clean speech and trigger the Backend Gaussian (BG) selection, while PROSPECT CGs are used for channel estimation, which improved the recognition accuracy by 3.58% relative on test sets 8-14

Read more

Summary

Introduction

One of the major concerns in deploying Automatic Speech Recognition (ASR) applications is the lack of robustness of the technology when compared to human listeners. Many approaches which reduce the mismatch to improve the noise robustness of speech recognition have been proposed earlier They modify either the frontend signal preprocessing or the backend acoustic model of the recognizer. In the late 1990s, Missing Data Techniques (MDT) were introduced in speech recognition as a perceptually motivated approach to improve the noise robustness of a speech recognizer. This was already noticed in [15], where the problem was addressed by compromising on the acoustic model (diagonal Gaussians for log-spectral features). It is an algorithm that aims at computational gains for large vocabulary speech recognizers without sacrificing accuracy or robustness It provides a solution for applying MDT to an existing backend model trained for the speech feature vector of one’s choice. Spectral and Cepstral MDT systems we review some of the concepts of MDT that lead to approaches that are most related to the proposed system

Bounds
XXXX X
Data and models
Recognizer
Baselines
Findings
Conclusions and future work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.