In this article, we studied the objective functions of MMI, MCE, and MPE/MWE for discriminative learning in sequential pattern recognition. We presented an approach that unifies the objective functions of MMI, MCE, and MPE/MWE in a common rational-function form of (25). The exact structure of the rational-function form for each discriminative criterion was derived and studied. While the rational-function form of MMI has been known in the past, we provided the theoretical proof that the similar rational-function form exists for the objective functions of MCE and MPE/MWE. Moreover, we showed that the rational function forms for objective functions of MMI, MCE, and MPE/MWE differ in the constant weighting factors CDT (s1 . . . sR) and these weighting factors depend only on the labeled sequence s1 . . . sR, and are independent of the parameter set - to be optimized. The derived rational-function form for MMI, MCE, and MPE/MWE allows the GT/EBW-based parameter optimization framework to be applied directly in discriminative learning. In the past, lack of the appropriate rational-function form was a difficulty for MCE and MPE/MWE, because without this form, the GT/EBW-based parameter optimization framework cannot be directly applied. Based on the unified rational-function form, in a tutorial style, we derived the GT/EBW-based parameter optimization formulas for both discrete HMMs and CDHMMs in discriminative learning using MMI, MCE, and MPE/MWE criteria. The unifying review provided in this article has been based upon a large number of earlier contributions that have been cited and discussed throughout the article. Here we provide a brief summary of such background work. Extension to large-scale speech recognition tasks was accomplished in the work of [59] and [60]. The dissertation of [47] further improved the MMI criterion to that of MPE/MWE. In a parallel vein, the work of [20] provided an alternative approach to that of [41], with an attempt to more rigorously provide a CDHMM model re-estimation formula that gives positive growth of the MMI objective function. A crucial error of this attempt was corrected in [2] for establishing an existence proof of such positive growth. The main goal of this article is to provide an underlying foundation for MMI, MCE, and MPE/MWE at the objective function level to facilitate the development of new parameter optimization techniques and to incorporate other pattern recognition concepts, e.g., discriminative margins [66], into the current discriminative learning paradigm.
Read full abstract