Deep learning for content-based video retrieval in film and television production

Markus Mühling,Matthias Springstein,Uli Veith,Nikolaus Korfhage,Christian Otto,Eric Müller,Bernd Freisleben,Thomas Langelage,Ralph Ewerth

doi:10.1007/s11042-017-4962-9

Abstract

While digitization has changed the workflow of professional media production, the content-based labeling of image sequences and video footage, necessary for all subsequent stages of film and television production, archival or marketing is typically still performed manually and thus quite time-consuming. In this paper, we present deep learning approaches to support professional media production. In particular, novel algorithms for visual concept detection, similarity search, face detection, face recognition and face clustering are combined in a multimedia tool for effective video inspection and retrieval. The analysis algorithms for concept detection and similarity search are combined in a multi-task learning approach to share network weights, saving almost half of the computation time. Furthermore, a new visual concept lexicon tailored to fast video retrieval for media production and novel visualization components are introduced. Experimental results show the quality of the proposed approaches. For example, concept detection achieves a mean average precision of approximately 90% on the top-100 video shots, and face recognition clearly outperforms the baseline on the public Movie Trailers Face Dataset.

Full Text