Abstract
We propose a novel decoding framework by dynamically combining K multiple plug-in maximum a posteriori (MAP) decoders, with each solving for a sequence of symbols in a state-by-state manner in time and according to a set of constraints on the symbol sequences in space. The score combination occurs at the state level with the set of K combination weights either chosen to be equal (i.e., equal weighting scheme) or learned from a collection of data through a hierarchical Bayesian setting. When applied to automatic speech recognition (ASR), leveraging upon some characteristic differences in computing acoustic probabilities with both feed-forward deep neural networks (DNNs) and Gaussian mixture models (GMMs) at the hidden Markov phone state level, these scores can be discriminatively combined in plug-in MAP decoding. The DNN and GMM parameters can be trained from a large collection of speaker-independent (SI) speech data and further refined with a small set of speaker adaptation (SA) utterances. The per-speaker, per-state combination weights can be learned from SA data through the proposed hierarchical Bayesian approach. Experimental results on the Switchboard ASR task show that an ad hoc fixed-weight combination already reduces the word error rate (WER) to 16.9% from a SI WER of 17.4%. Model adaptation with 20 utterances can reduce the WER to 16.7%, which is further reduced to 16.1% using the SA models and fixed-weight combination decoding. The best WER of 15.3% is attained by using the proposed hierarchical Bayesian learned weights combining the two SA and two SI systems. Finally, we contrast the proposed technique with a state-of-the-art static system combination approach based on multiple word lattices generated by different ASR systems, and minimum Bayes risk. The experimental results demonstrate that static system combination cannot boost system performance of the individual systems, and the proposed dynamic combination scheme is needed.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.