Abstract

Uncertainty decoding has recently been successful in improving automatic speech recognition performance in noisy environments by considering the pre-processed feature vectors not as deterministic but rather as random variables containing estimation errors, residual noise and also artifacts introduced by the signal pre-processors themselves. However, the achievable improvements depend strongly on how well the statistics of these random variables are estimated in the recognition domain. In this paper, we compare two approaches for estimating these statistics. The first approach directly estimates the needed statistics in the recognition domain. The second one estimates the statistics in the processing domain and then propagates them through the typically nonlinear feature extraction to obtain the corresponding statistics in the recognition domain. Based on this distinction, we propose a new hybrid approach that combines the advantages of both approaches and avoids their disadvantages. The new hybrid approach can be used with any speech pre-processor, which enables wider usage of the uncertainty decoding approach instead of the conventional maximum likelihood approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call