Abstract

While digitization has changed the workflow of professional media production, the content-based labeling of image sequences and video footage, necessary for all subsequent stages of film and television production, archival or marketing is typically still performed manually and thus quite time-consuming. In this paper, we present deep learning approaches to support professional media production. In particular, novel algorithms for visual concept detection, similarity search, face detection, face recognition and face clustering are combined in a multimedia tool for effective video inspection and retrieval. The analysis algorithms for concept detection and similarity search are combined in a multi-task learning approach to share network weights, saving almost half of the computation time. Furthermore, a new visual concept lexicon tailored to fast video retrieval for media production and novel visualization components are introduced. Experimental results show the quality of the proposed approaches. For example, concept detection achieves a mean average precision of approximately 90% on the top-100 video shots, and face recognition clearly outperforms the baseline on the public Movie Trailers Face Dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call