Abstract

In speech synthesis and recognition, the segmentation is an important step. The result of further steps depend completely on this process. There are several effective segmentation method in the literature, but for Vietnamese speech, researchers usually base on their experience to set the length while using sliding window. It causes an inefficient segmentation; and they need to try with the other value (length of voice). In this paper, we propose a method supporting in segmentation for Vietnamese speech and automatically determine the suitable length of voices and silent pause. We firstly estimate, by experimenting, the min and average length of a voice and a silent pause for Vietnamese speech in three main type speaking (slow, normal and fast). Then, based on these values, we start to segment the voice and pause by sliding window with proposed algorithm. Experiment results show that the proposed method can be used to effectively segment the Vietnamese speech.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call