Abstract

In this paper, we introduce two new confidence measures for large vocabulary speech recognition systems. The major feature of these measures is that they can be computed without waiting for the end of the audio stream. We proposed two kinds of confidence measures: frame-synchronous and local. The frame-synchronous ones can be computed as soon as a frame is processed by the recognition engine and are based on a likelihood ratio. The local measures estimate a local posterior probability in the vicinity of the word to analyze. We evaluated our confidence measures within the framework of the automatic transcription of French broadcast news with the EER criterion. Our local measures achieved results very close to the best state-of-the-art measure (EER of 23% compared to 22.0%). We then conducted a preliminary experiment to assess the contribution of our confidence measure in improving the comprehension of an automatic transcription for the hearing impaired. We introduced several modalities to highlight words of low confidence in this transcription. We showed that these modalities used with our local confidence measure improved the comprehension of automatic transcription.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.