Abstract

As a support hearing-impaired students in a classroom, real-time captioning and note taking using automatic speech recognition (ASR) have been investigated. However, even with ASR, editing by hand is needed check and correct recognition errors and redundant spoken expressions in ASR results, and thus it often leads delay in presenting captions. For efficient edit and quick presentation, we propose an automatic classification of ASR results in terms of usability as caption texts, and a presentation method based on the classification. In this study, we define the usability by syntactic correctness, errors and redundant spoken expressions in ASR results. Based on this definition, each unit of ASR results is classified into valid, invalid or to be checked, using hand-crafted rules and a machine learning framework. When presenting captions, valid input is presented promptly. To be checked input is manually edited, and then added captions. We developed a real-time captioning system by incorporating the automatic classification method and the presentation method, and conducted a trial of this system in a university lecture.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call