Abstract

A conventional feature compensation module for robust automatic speech recognition is usually designed separately from the training of hidden Markov model (HMM) parameters of the recognizer, albeit a maximum-likelihood (ML) criterion might be used in both designs. In this paper, we present an environment-compensated minimum classification error (MCE) training approach for the joint design of the feature compensation module and the recognizer itself. The feature compensation module is based on a stochastic vector mapping function whose parameters have to be learned from stereo data in a previous approach called SPLICE. In our proposed MCE joint design approach, by initializing the parameters with an approximate ML training procedure, the requirement of stereo data can be removed. By evaluating the proposed approach on Auroral connected digits database, a digit recognition error rate, averaged on all three test sets, of 5.66% is achieved for multicondition training. In comparison with the performance achieved by the baseline system using ETSI advanced front-end, our approach achieves an additional overall error rate reduction of 12.4%

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call