Abstract

Automatic recognition of sequences of spoken digits (e.g., telephone or credit card numbers) can be accomplished with excellent accuracy, even in speaker-independent applications over telephone links. However, even such relatively simple recognition tasks suffer decreased performance in adverse conditions, such as significant background noise or fading on portable telephone channels. If one further imposes significant limitations on the computing resources to be dedicated to a recognition task, then robust, limited-resource speech recognition remains a suitable challenge, even for a vocabulary as simple as the digits. Since connected-digit recognition over telephone lines is a very practical application, the amount of computer resources needed for a given level of recognition accuracy was investigated for different levels and types of acoustic noise. Rather than use a traditional hidden Markov model approach with cepstral analysis, which is computationally intensive and does not always work well under adverse acoustic conditions, simpler spectral analysis was used, combined with a segmental approach. The limited nature of the vocabulary (i.e., ten digits) allows this simpler approach. High recognition accuracy is maintained despite a massive decrease (versus traditional methods) in both memory and computation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call