Abstract

The field of large vocabulary continuous speech recognition has advanced to the point where there are several systems capable of providing greater than 95% word accuracy for speaker independent recognition, of a 1000 word vocabulary, spoken fluently for a task with a perplexity of about 60. There are several factors which account for the high performance achieved by these systems, including the use of effective feature analysis, the use of hidden Markov model (HMM) methodology, the use of context-dependent sub-word units to capture intra-word and inter-word phonemic variations, and the use of corrective training techniques to emphasize differences between acoustically similar words in the vocabulary. In this paper we describe a large vocabulary continuous speech recognition system developed at AT&T Bell Laboratories, and discuss the methods used to provide high word recognition accuracy. In particular we focus our discussion on the techniques adopted to select the set of fundamental speech units and to provide the acoustic models of these sub-word units based on a continuous density HMM (CDHMM) framework. Different modeling approaches, such as a discrete HMM and a tied-mixture HMM, will also be discussed and compared to the CDHMM approach.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call