Automatic recovery and verification of subtitles for large collections of video clips

M Armstrong

doi:10.1049/ibc.2016.0026

Abstract

This paper describes an experimental system that can create good quality subtitle files for video clips derived from broadcast content. The system is designed to run automatically without the need for human verification. The approach utilises existing metadata sources, an off-air broadcast archive and an archive of original subtitle files along with audio fingerprinting and speech-to-text technology to identify the source programme. It then locates the position of the video clip, verifies the match between the video clip and the subtitles and create a new subtitle file. This paper also reports on the results of the work using a large corpus of over 7,000 video clips and further, smaller sets of clips from different television genres, and explores where improvements might be made. It also looks at the limitations of the current approach discussing alternative methods for providing subtitles for video clips.

Full Text