Speaker-independent Speech Recognition Research Articles

In a traditional speech recognition system, the distance score between a test token and a reference pattern is obtained by simply averaging the distortion sequence resulted from the matching of the two patterns through a dynamic programming procedure. The final decision is made by choosing the one with the minimal average distance score. If one views the distortion sequence as a form of observed features, a decision rule based on a specific discriminant function designed for the distortion sequence obviously will perform better than that based on the simple average distortion. The authors therefore, suggest a linear discriminant function of the form triangle = Sigma /sub (i1)/T w(i)* d(i) to compute the distance score triangle instead of a direct average triangle =1/T Sigma /sub (i1)/T d(i). Several adaptive algorithms are proposed to learn the discriminant weighting function. These include one heuristic method, two methods based on the error propagation algorithm, and one method based on the generalized probabilistic descent algorithm (GPD). They study these methods in a speaker-independent speech recognition task involving utterances of the highly confusible English E-set (b,c,d,e,g,p,t,v,z). The results show that the best performance is obtained by using the GPD-method which achieved a 78.1% accuracy, compared to 67.6% with the traditional unweighted average method. Besides the experimental comparisons, an analytical discussion of various training algorithms is also provided.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>

Speaker-independent speech recognition experiments using an auditory model front end with a spectro-temporal masking model demonstrated the improvement of the recognition performance and outperformed the auditory front ends without the masking model and the traditional LPC-based front ends. The auditory model front end composed of an adaptive Q cochlear filter bank incorporating spectro-temporal masking has been proposed [J. Acoust. Soc. Am. 92, 2476 (A) (1992)]. The spectro-temporal masking model can enhance common phonetic features by eliminating the speaker-dependent spectral tilt that reflects individual source variation. It can also enhance the spectral dynamics that convey phonological information in speech signals. These advantages result in an effective new spectral parameter for representing speech models for speaker-independent speech recognition. Speaker-independent word and phoneme recognition experiments were carried out for Japanese word and phrase databases. The masked spectrum was calculated by subtracting the masking level from logarithmic power spectra extracted using a 64-channel adaptive Q cochlear filter bank. The masking levels were calculated as the weighted sum of the smoothed preceding spectra. To cover the variability of the time sequences of the spectrum, multi-template DTW and hidden Markov model were used as the backend recognition mechanism. a)Also at ATR Auditory and Visual Perception Res. Labs.

Speaker-independent Speech Recognition Research Articles

Related Topics

Articles published on Speaker-independent Speech Recognition

Speaker independent speech recognition system and method using neural network and DTW matching technique

Frame-correlated hidden Markov model based on extended logarithmic pool

A log-index weighted cepstral distance measure for speech recognition

Method and system for identifying and recognizing speech

Speech recognition based on variable information rate model

Speaker-independent speech recognition based on tree-structured speaker clustering

Speaker independent speech recognition system and method using neural network and/or DP matching technique

Normalizing the vocal tract length for speaker independent speech recognition

Vowel recognition using an articulatory representation

Large-vocabulary continuous speech recognition algorithm applied to a multi-modal telephone directory assistance system

Reviewing automatic language identification

Issues in feature-based recognition of speech mixed with impulsive sounds

A KOREAN LARGE VOCABULARY SPEECH RECOGNITION SYSTEM FOR AUTOMATIC TELEPHONE NUMBER QUERY SERVICE

A very large vocabulary continuous speech recognition algorithm for telephone directory assistance

A speaker-independent continuous speech recognition system using continuous mixture Gaussian density HMM of phoneme-sized units

Discriminative analysis of distortion sequences in speech recognition

Speaker-independent speech recognition using an auditory model front end incorporating the spectro-temporal masking effect

Speaker independent speech recognition process

Speaker independent speech recognition based on neural networks of each category with embedded eigenvectors.

Speaker adaptation in speech recognition using linear regression techniques

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Speaker-independent Speech Recognition Research Articles

Related Topics

Articles published on Speaker-independent Speech Recognition

Speaker independent speech recognition system and method using neural network and DTW matching technique

Frame-correlated hidden Markov model based on extended logarithmic pool

A log-index weighted cepstral distance measure for speech recognition

Method and system for identifying and recognizing speech

Speech recognition based on variable information rate model

Speaker-independent speech recognition based on tree-structured speaker clustering

Speaker independent speech recognition system and method using neural network and/or DP matching technique

Normalizing the vocal tract length for speaker independent speech recognition

Vowel recognition using an articulatory representation

Large-vocabulary continuous speech recognition algorithm applied to a multi-modal telephone directory assistance system

Reviewing automatic language identification

Issues in feature-based recognition of speech mixed with impulsive sounds

A KOREAN LARGE VOCABULARY SPEECH RECOGNITION SYSTEM FOR AUTOMATIC TELEPHONE NUMBER QUERY SERVICE

A very large vocabulary continuous speech recognition algorithm for telephone directory assistance

A speaker-independent continuous speech recognition system using continuous mixture Gaussian density HMM of phoneme-sized units

Discriminative analysis of distortion sequences in speech recognition

Speaker-independent speech recognition using an auditory model front end incorporating the spectro-temporal masking effect

Speaker independent speech recognition process

Speaker independent speech recognition based on neural networks of each category with embedded eigenvectors.

Speaker adaptation in speech recognition using linear regression techniques