Abstract

An accurate, properly labeled speech corpus is very important for speech research. However, manual segmentation and labeling is very laborious and error prone. This paper describes an automatic tool for segmenting and labeling of Malayalam speech data. The tool is based on Hidden Markov Model (HMM). HMM Tool Kit is used for training, segmentation and labeling the data. Special care was taken in the preparation of pronunciation dictionary so that it will cover most of the possible pronunciation variations. Syllabification rule is applied in the phone label for generating syllable label also.. Segmentation and labeling experiment was done on the speech corpus collected for building text-to-speech system. The performance of the tool is reasonably good as it shows only 19ms average deviation compared to manual labels.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.