Abstract

Building a large vocabulary continuous speech recognition (LVCSR) system requires a lot of hours of segmented and labelled speech data. Arabic language, as many other low-resourced languages, lacks such data, but the use of automatic segmentation proved to be a good alternative to make these resources available. In this paper, we suggest the combination of hidden Markov models (HMMs) and support vector machines (SVMs) to segment and to label the speech waveform into phoneme units. HMMs generate the sequence of phonemes and their frontiers; the SVM refines the frontiers and corrects the labels. The obtained segmented and labelled units may serve as a training set for speech recognition applications. The HMM/SVM segmentation algorithm is assessed using both the hit rate and the word error rate (WER); the resulting scores were compared to those provided by the manual segmentation and to those provided by the well-known embedded learning algorithm. The results show that the speech recognizer built upon the HMM/SVM segmentation outperforms in terms of WER the one built upon the embedded learning segmentation of about 0.05%, even in noisy background.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.