Abstract

This paper proposes a first-ever phrase-level transduction model with reordering to transform colloquial speech directly to written-style transcription. This model is capable of performing n-m transductions. Our transduction model is trained from a parallel corpus of verbatim transcription and written-style transcription. Deletions, substitutions, insertions are well represented using this model. Inversion transduction cases can also be identified and represented. We implement our transduction model using weighted finite-state transducers (WFSTs), and integrate it into a WFST-based speech recognition search space to give both verbatim speaking-style and written-style transcriptions. Evaluations of our model on Cantonese speech to standard written Chinese show 11.59% relative Word Error Rate (WER) reduction over interpolated language model between Cantonese and standard Chinese speech, 5.72% relative WER reduction and 14.82% relative Bilingual Evaluation Understudy (BLEU) improvement over the word-level transduction model.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call