Abstract

Distinctive phonetic features have an important role in Arabic speech phoneme recognition. In a given language, distinctive phonetic features are extrapolated from acoustic features using different methods. However, exploiting lengthy acoustic features vector in the sake of phoneme recognition has a huge cost in terms of computational complexity, which in turn, affects real time applications. The aim of this work is to consider methods to reduce the size of features vector employed for distinctive phonetic feature and phoneme recognition. The objective is to select the relevant input features that contribute to the speech recognition process. This, in turn, will lead to a reduced computational complexity of recognition algorithm, and an improved recognition accuracy. In the proposed approach, genetic algorithm is used to perform optimal features selection. Therefore, a baseline model based on feedforward neural networks is first built. This model is used to benchmark the results of proposed features selection method with a method that employs all elements of a features vector. Experimental results, utilizing the King Abdulaziz City for Science and Technology Arabic Phonetic Database, show that the average genetic algorithm based phoneme overall recognition accuracy is maintained slightly higher than that of recognition method employing the full-fledge features vector. The genetic algorithm based distinctive phonetic features recognition method has achieved a 50% reduction in the dimension of the input vector while obtaining a recognition accuracy of 90%. Moreover, the results of the proposed method is validated using Wilcoxon signed rank test.

Highlights

  • P ERFORMANCE of automatic speech recognition (ASR) systems is highly affected by input features that are extracted from the speech waveform

  • Some features that are used in ASR systems are acoustic features such as spectrogram, mel-frequency cepstral coefficients (MFCCs), and short-time energy just to name a few

  • Each variant of Genetic algorithms (GAs) has different time complexity based on algorithm implementation; for example, time complexity is shown to be polynomial of degree two in [53], where in [54], it is proportional to number of samples in the training set multiplied by squared number of total features, whereas in [55], it is proportional to number of features under investigation

Read more

Summary

Introduction

P ERFORMANCE of automatic speech recognition (ASR) systems is highly affected by input features that are extracted from the speech waveform. Some features that are used in ASR systems are acoustic features such as spectrogram, mel-frequency cepstral coefficients (MFCCs), and short-time energy just to name a few. There are other types of features that are highly representative, which are the distinctive phonetic features (DPFs). These features are introduced to a system as binary vectors where each bit of that vector describes the presence or absence (denoted as + or –, respectively) of some articulatory and acoustic properties that are associated with a particular phoneme utterance. DPFs are language dependent and each spoken language has its own finite set of DPFs, where a unique binary vector is assigned to each phoneme of the language [1]

Objectives
Findings
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.