Abstract

The minimum classification error (MCE) framework is an approach to discriminative training for pattern recognition that explicitly incorporates a smoothed version of classification performance into the recognizer design criterion. Many studies have confirmed the effectiveness of MCE for speech recognition. In this article, we present a theoretical analysis of the smoothness of the MCE loss function. Specifically, we show that the MCE criterion function is equivalent to a Parzen window-based estimate of the theoretical classification risk. In this analysis, each training token is mapped to the center of a Parzen kernel in the domain of a suitably defined random variable. The kernels are summed to produce a density estimate; this estimate in turn can easily be integrated over the domain of incorrect classifications, yielding the risk estimate. The expression of risk for each kernel corresponds directly to the usual MCE loss function. The specific form of the Parzen window corresponds to the specific form of the MCE loss function. The derivation presented here shows that the smooth MCE loss function, far from being an ad-hoc approximation of the true error, can be seen as the direct consequence of using a well-understood type of smoothing, Parzen estimation, to estimate the theoretical risk from a finite training set. This analysis provides a novel link between the MCE empirical cost measured on a finite training set and the theoretical classification risk.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call