Abstract

Speech Transcription is important for video/audio retrieval and many other applications. In automatic speech transcription, recognition errors are inevitable, which makes user feedback such as manual error correction necessary. In this paper, an approach is proposed to improve the accuracy of speech transcription by exploiting user feedback and word repetition. The method aims at learning from user feedback and recognition results of preceding utterances and then correcting errors when repeated words are falsely recognized in following utterances. An interaction scheme for user feedback is proposed, which facilitate error correction by candidate lists and provide a new kind of feedback referred to as word indication to extend error correction from repeated words to repeated phrases. For template extraction and matching, the representation of word template and recognition results based on syllable confusion network (SCN) is proposed. During the transcription, templates of multi-syllable words/phrases based on SCN are extracted from user feedback and the N-best lattice, and then matched in SCN corresponding to recognition results of subsequent utterances to yield a new candidate list when repeated words are detected. Experimental results show that considerate error reduction is achieved in the newly-generated candidate lists.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call