Abstract

We propose a new approach for performing phonetic transcription of text that utilizes automatic speech recognition (ASR) to help traditional grapheme-to-phoneme (G2P) techniques. This approach was applied to transcribe Chinese text into Taiwanese phonetic symbols. By augmenting the text with speech and using automatic speech recognition with a sausage searching net constructed from multiple pronunciations of text, we are able to reduce the error rate of phonetic transcription. Using a pronunciation lexicon with multiple pronunciations for each item, a transcription error rate of 12.74% was achieved. Further improvement can be achieved by adapting the pronunciation lexicon with pronunciation variation (PV) rules derived manually from corrected transcription in a speech corpus. The PV rules can be categorized into two kinds: knowledge-based and data-driven rules. By incorporating the PV rules, an error rate of 10.56% could be achieved. Although this technique was developed for Taiwanese speech, it could easily be adapted to other Chinese spoken languages or dialects.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call