Abstract

In this paper, a novel approach has been proposed for the automatic segmentation of speech signal into phonemes. In a well spoken word, phonemes can be characterized by the changes observed in speech waveform. To get phoneme boundaries, the signal level properties of speech waveform i.e. changes in the waveform during transformation from one phoneme to the other are explored. The problem of phoneme level segmentation has been addressed in this work from two aspects 1. Segmentation of phonemes between voiced and unvoiced portions and 2. Segmentation of phonemes within voiced and unvoiced regions. Pitch and zero-frequency filter signal are used to get the region of change from voiced to unvoiced and vice versa. The segmentation of phoneme boundaries within voiced and unvoiced regions are approximated using the properties of power spectrum of correlation of adjacent frames of the signal. A finite set of rules is proposed on the variations observed in the power spectrum during phoneme transitions. The segmentation results of both approaches are combined to get the final phoneme boundaries. Three databases namely TIMIT Corpus, IIIT Hyderabad Marathi database & IIIT Hyderabad Hindi database (IIIT-H Indic Speech Databases) are used to test the proposed approach; an accuracy of 95.40%, 96.87% and 96.12% is achieved within the tolerance range of 10 ms respectively. The results of the proposed approach are observed to give precise phoneme boundaries.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.