Abstract

Speech is the most efficient and popular means of human communication Speech is produced as a sequence of phonemes. Phoneme recognition is the first step performed by automatic speech recognition system. The state-of-the-art recognizers use mel-frequency cepstral coefficients (MFCC) features derived through short time analysis, for which the recognition accuracy is limited. Instead of this, here broad phoneme classification is achieved using features derived directly from the speech at the signal level itself. Broad phoneme classes include vowels, nasals, fricatives, stops, approximants and silence. The features identified useful for broad phoneme classification are voiced/unvoiced decision, zero crossing rate (ZCR), short time energy, most dominant frequency, energy in most dominant frequency, spectral flatness measure and first three formants. Features derived from short time frames of training speech are used to train a multilayer feedforward neural network based classifier with manually marked class label as output and classification accuracy is then tested. Later this broad phoneme classifier is used for broad syllable structure prediction which is useful for applications such as automatic speech recognition and automatic language identification.

Highlights

  • In terms of human communication, speech is the most important and efficient mode of communication even in today's multimedia society

  • The output labels obtained while testing the classifier using selected words are given below

  • Classifier is developed using feed forward neural network in order to automatically label each frame in terms of broad phoneme classes

Read more

Summary

INTRODUCTION

In terms of human communication, speech is the most important and efficient mode of communication even in today's multimedia society. The waveform representation of each phoneme is characterized by a small set of distinctive features, where a distinctive feature is a minimal unit which distinguishes between two maximally close but linguistically distinct speech sounds. These acoustic features should not be affected by different. Broad phoneme classes include vowels, nasals, plosives, fricatives, approximants and silence [1][2] Each of these classes has some discriminant features so that they can be classified. For example vowels can be categorized with its higher amplitude, whereas fricatives with their high zero crossing rate For identifying such characteristics an analysis study was conducted and results were summarized.

SYSTEM OVERVIEW
Data Collection and Transcription
Feature Extraction
Formant Frequencies
EXPERIMENTAL RESULTS
Evaluation
Results and discussion
N 43 35 0 0 22 0 11 73 13 0 3 0 19 60 16 0 5 0
Application of proposed system for syllable structure prediction
SUMMARY AND SCOPE OF FUTURE WORK
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.