Category Pair Research Articles

This paper proposes a model of a spectrum target prediction mechanism and a preprocessing method for automatic speech recognition by using the model to cope with coarticulation. The model is constructed to predict particular spectra for each phoneme, that is phoneme target, and to keep their spectra constant in each phoneme interval. The method is evaluated by four measures: spectrum sequence stability—Are predicted spectrum sequence in each phoneme interval fixed?; intra-category spectrum variation—Is a variation of predicted spectra in each phoneme category small?; inter-category spectrum variation—Is phoneme category pair far apart measuring by the Mahalanobis distance?; and lengths of transitional sounds—How long is the duration of wrong recognized results in a phoneme interval. Experimental results indicate that predicted spectra throughout the model are stabilized in each phoneme interval. Moreover, by using the method, intra-category variation decreases and inter-category variation increases. The results also indicate that the model recovers vowel characteristics neutralized by coarticulation at the spectral transition portion and decreases the duration of transitional sounds. Consequently, the spectrum target prediction model implemented as a speech recognition preprocessor reduces recognition error rates.

This paper presents a model of a lower level contextual effect that can cope with coarticulation problems, especially vowel neutralization. The model is constructed to overshoot spectral peak trajectories based on spectral peak interaction, assuming that the lower level contextual effect is represented as the sum of interaction between each spectral peak pair. The interaction function is determined experimentally in order to reduce the distance between a real spectral peak and its target which is a spectral peak mean computed for vowel uttered in isolation. The interaction function thus determined suggests that: (1) there can be a time-frequency lateral inhibition in the auditory system like that on the retina in the visual system, (2) the interaction function is consistent with the results of psychaocoustic experiments concerning the assimilation and/or contrast effect using paired single formant stimuli, and (3) the contextual effect between adjacent phonemes can be represented as the sum of the assimilation and/or contrast effects between each spectral peak pair. Applying the determined interaction function to real speech data to cope with coarticulation problem, spectral peak trajectories overshoot, spectral peaks at the vowel center approach their own targets, and the distance between each vowel category pair increases.

Category Pair Research Articles

Articles published on Category Pair

Spectrum target prediction model and its application to speech recognition

Modeling of contextal effect based on spectral peak interaction

Preschool Children's Performance with Perceptual and Conceptual Recognition Criteria

Category similarity effects in children's semantic memory retrieval

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Category Pair Research Articles

Articles published on Category Pair

Spectrum target prediction model and its application to speech recognition

Modeling of contextal effect based on spectral peak interaction

Preschool Children's Performance with Perceptual and Conceptual Recognition Criteria

Category similarity effects in children's semantic memory retrieval