Abstract

This paper presents a study on how the performance of Phonetic engine(PE) varies with different set of spectral features selected for it. An exclusive study is carried out with a PE developed in the Manipuri language. Here, we built the PE using phonetic transcriptions and modeling of each phonetic unit by Hidden Markov Model (HMM). The symbols of International Phonetic Alphabet (IPA) (revised in 2005) are used in the transcription of the collected data. A 5-state left to right HMM with 32 mixtures in each state is being used to build a model that represents each phonetic unit. Speech feature extraction is a very important stage in the development of such a PE since it is responsible for the overall accuracy of the system. Therefore, selection of a proper feature extraction technique is very crucial in building the PE. In speech and speaker recognition literature, many techniques available for feature extraction, for example, the Linear Predictive Cepstral Coefficients (LPCC), the Mel-frequency Cepstral Coefficients (MFCC), the Perceptual Linear Prediction (PLP) and the Linear Predictive Coding (LPC) etc., to name a few. In our paper, we attempt to analyze the performance of the Manipuri PE for the three widely used spectral features: MFCC, PLP and LPCC for three different modes of collected data: Read, Lecture and Conversation. Here, we are using 13, 26 and 39 coefficient dimensions for each of the above features. After analyzing the accuracy of our system, we found that the PLP and the MFCC are superior to the LPCC under all conditions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call