Abstract
Speech sentence is the input of automatic phonetic segmentation or transcription. This paper discusses our efforts on automatic speech sentence segmentation from multi-paragraph speech databases for building Text-To-Speech (TTS) system speech corpus automatically. We present a) a system of automatic speech sentence segmentation from broadcasting audio based on forced alignment technique, in which a checking Mechanism based on speech recognition technique is also used, b) an iterative algorithm to improve the system, c) a music detector based on a scheme combination of Variable Duration Hidden Markov Model (VDHMM) and Gaussian Mixture Model (GMM). Experiments show that the improved system has 98.93% of Sentence Accurate Rate (SAR) and generates 646 correct sentences, compared with 97.85% of SAR, and 155 correct sentences in original system.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have