Abstract

Abstract: This paper presents a novel method of classifying speech phonemes. Four hybrid techniques based on the acoustic-phonetic approach and pattern recognition approach are used to emphasize the principle idea of this research. The first hybrid model is constructed of fixed state, structured Hidden Markov Model, Gaussian Mixture, Mel scaled Best Tree Image, Convolution Neural network, Vector Quantization (FS-HMM-GM-MBTI-CNN-VQ). The second hybrid model is constructed of variable state, dynamically structured Hidden Markov Model, Gaussian Mixture, Mel scaled Best Tree Image, Convolution Neural network, Vector Quantization (VS-HMM-GM-MBTI-CNN-VQ). The third hybrid model is constructed of fixed state, structured Hidden Markov Model, Gaussian Mixture, Mel scaled Best Tree Image, Convolution Neural network (FS-HMM-GM-MBTI-CNN). The fourth hybrid model is constructed of variable state, dynamically structured Hidden Markov Model, Gaussian Mixture, Mel scaled Best Tree Image, Convolution Neural network (VS-HMM-GM-MBTI-CNN). TIMIT database is used in this paper. All phones are classified into five classes and segregated into Vowels, Plosives, Fricatives, Nasals, and Silences. The results show that using (VS-HMM-GM-MBTI-CNN-VQ) is an available method for classification of phonemes, with the potential for use in applications such as automatic speech recognition and automatic language identification. Competitive results are achieved especially in nasals, plosives, and silence high successive rates than others.

Highlights

  • Speech is the most competent and popular means of human communication which is produced as a sequence of phonemes

  • We calculate the success rate (SR) in each case as in table 2 The success rate can be defined by equation 1 and the result is shown as in Fig. 14 that shows the value of each SR against the Gaussian mixture model (GMM) In this equation: (D denotes deletions), (S denotes substitution) and (N denotes the number of phones in the expected transcription)

  • The first hybrid features consist of Mel Best Tree image, Convolution Neural Network, Vector Quantization (MBTI-CNN-VQ)

Read more

Summary

INTRODUCTION

Speech is the most competent and popular means of human communication which is produced as a sequence of phonemes. From these phonemes, we extract features vector which is necessary for the classification method. We extract features vector which is necessary for the classification method This classification of sounds is implemented for more applications like speech recognition and language recognition. The broad phone classes are usually known as vowels, plosives, fricatives, nasals, and silence. This categorization can improve speech recognition and categorization techniques were attempted.

RELATED WORK
Hidden Markov Model with Gaussian Mixture Model
Database
Procedure of proposed model
RESULTS AND DISCUSSIONS
RESULTS
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call