Phoneme Recognition Rate Research Articles

This paper describes a speaker-independent phoneme and word recognition system based on a recurrent error propagation network (REPN) trained on the TIMIT database. The REPN is a fully recurrent error propagation network trained by the propagation of the gradient signal backwards in time. A variation of the stochastic gradient descent procedure is used which updates the weights by an adaptive step size in the direction given by the sign of the gradient. Phonetic context is stored internal to the network and the outputs are estimates of the probability that a given frame is part of a segment labelled with a context-independent phonetic symbol. During recognition, a dynamic programming match is made to find the most probable string of symbols. The one pass algorithm is used for phoneme and word recognition. The phoneme recognition rate for all 61 TIMIT symbols is 70·0% correct (63·5% accuracy including insertion errors) and on a reduced 39-symbol set the recognition rate is 76·5% correct (69·8%). This compares favourably with the results of other methods, such as HMMs, on the same database [K. F. Lee & H. W. Hon 1989. IEEE Transactions on Acoustics, Speech and Signal Processing, 37, 1641–1648; S. E. Levinson, M. Y. Liberman, A. Ljolje & L. G. Miller 1989. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Glasgow, pp. 441–444]. Analysis of the phoneme recognition results shows that information available from bigram and durational constraints is adequately handled within the network allowing for efficient parsing of the network output. For comparison, there is less computation involved in the resulting scheme than in a one-state-per-phoneme HMM system. This is demonstrated by applying the recognizer to the DARPA 1000-word resource management task. Parsing the network output to the word level with no grammar and no pruning can be carried out in faster than real time on a SUN 4 330 workstation.

Read full abstract

We describe a method of recognizing isolated words and phrases from a given vocabulary spoken by any member in a given group of speakers, the identity of the speaker being unknown to the system. The word utterance is divided into 20-30 nearly equal frames, frame boundaries being aligned with glottal pulses for voiced speech. A constant number of pitch periods are included in each frame. Statistical decision rules are used to determine the phoneme in each frame. Using the string of phonemes from all the frames of the utterance, a word decision is obtained using (phonological) syntactic rules. The syntactic rules used here are of 2 types, namely, 1) those obtained from the theory of word construction from phonemes in English as applied to our vocabulary, 2) those used to correct possible errors in phonemic decisions obtained earlier based on the decisions of neighboring segments. In our experiment, the vocabulary had 40 words, consisting of many pairs of words which are phonemically close to each other. The number of speakers was 6. The identity of the speaker is not known to the system. In testing 400 words utterances, the recognition rate was about 80 percent for phonemes (for 11 phonemes) but the word recognition was 98.1 percent correct. Phonological-syntactic rules played an important role in upgrading the word recognition rate over the phoneme recognition rate.

Read full abstract

Phoneme Recognition Rate Research Articles

Related Topics

Articles published on Phoneme Recognition Rate

A recurrent error propagation network speech recognition system

Phoneme recognition with elliptic discrimination neural units

Speech recognition using HMMs with an LVQ-trained codebook

Duration control methods for HMM phoneme recognition

Recognition of Spoken Words and Phrases in Multitalker Environment Using Syntactic Methods

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Phoneme Recognition Rate Research Articles

Related Topics

Articles published on Phoneme Recognition Rate

A recurrent error propagation network speech recognition system

Phoneme recognition with elliptic discrimination neural units

Speech recognition using HMMs with an LVQ-trained codebook

Duration control methods for HMM phoneme recognition

Recognition of Spoken Words and Phrases in Multitalker Environment Using Syntactic Methods