Abstract

This paper introduces a lecture video corpus, Autoblog 2020. With the increase of online learning in universities, there is a demand for a systematic toolchain development for lecture video processing. However, the existing lecture video corpus does not satisfy the requirement for such tasks, and lecture transcription and analyses are relatively unexplored areas in speech and natural language research. Autoblog 2020 Corpus is developed towards the end goal of free video-to-blog post conversion software that supports making video presentations more accessible. It will include automatic editing of disfluencies, automatic speech recognition (ASR), and spoken term extraction so that researchers can process and share their contents more efficiently. In this paper, we present a description of the corpus, linguistic analyses and preliminary experiment results regarding ASR, keyword extraction, and segmentation. The results will be used in future work to develop a video-to-blog post conversion.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call