Abstract

This paper presents an automatic speech recognition system using acoustic models based on both sub-phonetic units and broad, phonological features such as voiced and round as output densities in a hidden Markov model framework. The aim of this work is to improve speech recognition performance particularly on conversational speech by using units other than phones as a basis for discrimination between words. We explore the idea that phones are more of a short-hand notation for a bundle of phonological features, which can also be used directly to distinguish competing word hypotheses. Acoustic models for different features are integrated with phone models using a multi-stream approach and log-linear interpolation. This paper presents a new lattice based discriminative training algorithm using the maximum mutual information criterion to train stream weights. This algorithm allows us to automatically learn stream weights from training or adaptation data and can also be applied to other tasks. Decoding experiments conducted in comparison to a non-feature baseline system on the large vocabulary English Spontaneous Scheduling Task show reductions in word error rate of about 20% for discriminative model adaptation based on articulatory features, slightly outperforming other adaptation algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call