Abstract

A system is described for personal computers that permits voice recognition of queries via a telephone, the appropriate retrieval of information from the computer database, the voice synthesis of that information, and the transmission to the caller. Specifically, the generation of the database itself, together with the architecture of the recognition system, are delineated, in addition to the interfacing of the telephone system to the computer.The system chosen for recognition and synthesis consists of a Texas Instrument TMS32010 programmable digital signal processor (DSP). This chip is used for the so-called feature extraction (extraction of defining parameters from the input speech signal to be recognized) and contains a proprietary algorithm, a modified FFT, that generates spectral information. This is in contrast with a number of other speech recognition systems, that use linear predictive coding (LPC) techniques in conjunction with other technology. LPC is not used because it is subject to corruption by noise. Utilizing the modified FFT algorithm, the system can be used, and will operate properly with background noise levels up to 100db. The system converts the analog input signal to digital using PCM. The data output of the TMS32010 is then processed by a Motorola 6809 chip in conjunction with a custom VLSI gate array with a dynamic programming algorithm. The 6809 handles interfacing with the host and assists with the matching of speech patterns, or templates. The gate array with the dynamic programming algorithm, lengthens or shortens the frequency of words to account for any variation in utterance, looks for the beginning and endings of matching with the 6809. Thus, this chip set performs the recognition function. With the system as configured in the dependent mode, the vocabulary is 150 words, consisting of 300 templates in the continuous mode. For voice synthesis, the expected or predetermined responses are entered, digitized from 4 kbps to 24 kbps, and compressed. When outputted as a response to a command or from a database, the stored voice data is expanded, and natural sounding speech results. Two EPROMS contain the firmware. For speaker independent recognition, no training is required. The speech patterns are adjusted and matched to previously derived templates generated at the manufacturer. For the specific case of this system, the stored templates used for a match are yes, no, and the numbers 1 through 4. Also present is a telephone interface board that allows commands from the touch-tone keypad, or by voice, either speaker dependent or independent. Of course, if the telephone interface is used, the speaker dependent portion must be trained through a telephone connection, rather than through a microphone.For this system, general applications, such as the voice (or touch-tone keypad) query of a computer database, and the resultant voice response, are the most common, although the system can be used as a sophisticated answering machine and call-out device also. Examples of applications include inventory control, automated retrieval of parts from a storage facility, a dictation machine, etc.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call