Abstract
Direct recognition of phonemes in speaker independent speech recognition systems still cannot guarantee good enough recognition results. But grouping phonemes at first then trying to recognize the phoneme itself is a promising field. On the other hand wavelets are widely used in speech and speaker recognition systems, this is motivated by the ability of wavelet coefficients to capture important time and frequency features. In this work the effect of the wavelet filter type on the efficiency of a phoneme recognition system is investigated (specifically fricatives). The Probabilistic neural network was used as a pattern matching stage for its well known and power full ability in solving classification problems. It was found that the Daubechies wavelet family (generally from db15 to db23) is a good candidate for a fricatives phoneme recognition system that is based on wavelets as a feature extraction stage.
Highlights
Automatic speech recognition (ASR) is a process by which a machine identifies speech
This procedure is done for every type of wavelet filter, as a result, the training and testing phase is repeated for 85 times
The effect of the type of the wavelet filter on phoneme recognition in a phoneme recognition system based on wavelet and neural network was examined
Summary
Automatic speech recognition (ASR) is a process by which a machine identifies speech. The machine takes a human utterance as an input and returns a string of words , phrases or continuous speech in the form of text as output. As ASR technology matures, the range of possible applications increases. A domain and speaker independent system able to correctly decode all speech found in communication between people into strings of words is not realistic with the current state of technology[1]. The acoustical realization of the same word or utterance pronounced by different speakers could differ very much. Even the same speaker can‘t pronounce the same word or phrase identically several times. Phonemic speech recognition should confront with big variation of the same phoneme and this causes degradation in phoneme recognition accuracy[2]
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have