Abstract

The research reported in this paper deals with a new method of phonemic analysis of speech by statistical pattern recognition techniques and its application to the problem of Automatic Speech Recognition (ASR). An on-line, adaptive, trainable speaker-independent system is implemented using this approach. The details of the system follow: first, the beginning and end points of the speech utterance are detected. The utterance is then sent for automatic segmentation where it is segmented into the following classes: (1) voiced, (2) unvoiced, (3) transition, and (4) silence. An 11-dimensional feature vector consisting of 10 linear predictor coefficients and zero-crossing rate is extracted from these regions. For voiced and transition region, the feature extraction is done pitch synchronously and for unvoiced regions, a constant frame of 6.4 msec is used. A new phonetic unit called phoneme-pair is defined for the transition regions, while the unvoiced and voiced regions are represented using the phonemes of the IPA. Conditional probability densities for each of the phonemes and phoneme-pairs are estimated using non-parametric methods as a single polynomial in the 11-dimensional space. The classifier makes Bayes' minimum risk decision based on these probability densities. The recognition results of the ASR system are Training Set: 98.4%, Test Set: 96.0% (for speakers in the training set) and 91.0% (for speakers not in the training set). The present vocabulary of the system is 60 words and any new word can be added by entering its corresponding phonetic transcription. The adaptive and trainable characteristics of the system will also be demonstrated.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call