Abstract

Accurately modeling the acoustic variabilities caused by coarticulation is important in continuous speech recognition. Recent research indicates that syllable units do better in modeling intra-syllable co-articulation effect than sub-syllable units. However, most continuous Mandarin speech recognition systems use context dependent phones or Initial/Finals (IFs) as the basic acoustic unit because it is difficult to collect sufficient data to train longer units. Here we present a syllable based approach which includes two steps. Firstly, context independent syllable based acoustic models are trained, and the models are initialized by intra-syllable IFs based diphones to solve the problem of training data sparsity. Secondly, we capture the inter-syllable co-articulation effect by incorporating inter-syllable transition models into the recognition system. Experiment results show that the acoustic model based on the presented approach is effective in improving the recognition performance. Index Terms: speech recognition, modeling unit selection, coarticulation

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.