Efficient automatic recognition of spoken digit strings

Douglas O’Shaughnessy,Hesham Tolba

doi:10.1121/1.4744144

Abstract

Automatic recognition of spoken digit sequences (such as credit card numbers) is now feasible even in speaker-independent applications over the telephone. However, all recognition tasks have lower performance in noisy conditions. If significant limitations are also imposed on the computational resources used for recognition, then robust speech recognition is still a significant challenge, even for a simple digit vocabulary. Since recognition of continuously spoken digits over telephone links is a very practical application, such recognition was investigated here under different conditions. Traditional hidden Markov model approaches with cepstral analysis were not used, because they are computationally intensive and have not always worked well under adverse acoustic conditions. Simpler spectral analysis was used, combined with a segmental approach. The analysis focuses on locations of spectral peaks, similar to formant tracking, but without the need to estimate peaks for all time frames. The limited nature of the vocabulary (i.e., ten digits) allows this simpler approach. High recognition accuracy is maintained despite being very efficient in both memory and computation. Recent progress will be reported.

Full Text